Re: linux-next: Tree for Nov 7
On Mon 13-11-17 09:35:22, Khalid Aziz wrote: > On 11/13/2017 09:06 AM, Michal Hocko wrote: > > OK, so this one should take care of the backward compatibility while > > still not touching the arch code > > --- > > commit 39ff9bf8597e79a032da0954aea1f0d77d137765 > > Author: Michal Hocko> > Date: Mon Nov 13 17:06:24 2017 +0100 > > > > mm: introduce MAP_FIXED_SAFE > > MAP_FIXED is used quite often but it is inherently dangerous because it > > unmaps an existing mapping covered by the requested range. While this > > might be might be really desidered behavior in many cases there are > > others which would rather see a failure than a silent memory > > corruption. > > Introduce a new MAP_FIXED_SAFE flag for mmap to achive this behavior. > > It is a MAP_FIXED extension with a single exception that it fails with > > ENOMEM if the requested address is already covered by an existing > > mapping. We still do rely on get_unmaped_area to handle all the arch > > specific MAP_FIXED treatment and check for a conflicting vma after it > > returns. > > Signed-off-by: Michal Hocko > > > > .. deleted ... > > diff --git a/mm/mmap.c b/mm/mmap.c > > index 680506faceae..aad8d37f0205 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -1358,6 +1358,10 @@ unsigned long do_mmap(struct file *file, unsigned > > long addr, > > if (mm->map_count > sysctl_max_map_count) > > return -ENOMEM; > > + /* force arch specific MAP_FIXED handling in get_unmapped_area */ > > + if (flags & MAP_FIXED_SAFE) > > + flags |= MAP_FIXED; > > + > > /* Obtain the address to map to. we verify (or select) it and ensure > > * that it represents a valid section of the address space. > > */ > > Do you need to move this code above: > > if (!(flags & MAP_FIXED)) > addr = round_hint_to_min(addr); > > /* Careful about overflows.. */ > len = PAGE_ALIGN(len); > if (!len) > return -ENOMEM; > > Not doing that might mean the hint address will end up being rounded for > MAP_FIXED_SAFE which would change the behavior from MAP_FIXED. Yes, I will move it. -- Michal Hocko SUSE Labs
Re: [PATCHv4 0/6] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers
On (11/13/17 18:17), Helge Deller wrote: > On 10.11.2017 00:48, Sergey Senozhatsky wrote: > > All Ack-s/Tested-by-s were dropped, since the patch set has been > > reworked. I'm kindly asking arch-s maintainers and developers to test it > > once again. Sorry for any inconveniences and thanks for your help in > > advance. > > I tested it successfully on parisc64. > You can add back the > Tested-by: Helge Deller#parisc64 thanks! -ss
Re: [PATCH kernel v2] powerpc/powernv/ioda: Remove explicit max window size check
On Tue, Nov 07, 2017 at 02:43:01PM +1100, Alexey Kardashevskiy wrote: > DMA windows can only have a size of power of two on IODA2 hardware and > using memory_hotplug_max() to determine the upper limit won't work > correcly if it returns not power of two value. > > This removes the check as the platform code does this check in > pnv_pci_ioda2_setup_default_config() anyway; the other client is VFIO > and that thing checks against locked_vm limit which prevents the userspace > from locking too much memory. > > It is expected to impact DPDK on machines with non-power-of-two RAM size, > mostly. KVM guests are less likely to be affected as usually guests get > less than half of hosts RAM. > > Signed-off-by: Alexey KardashevskiyReviewed-by: David Gibson This _can_ be called with essentially arbitrary window_size via the VFIO code. However, that should be constrained by the locked memory limit, which is checked before this gets called. > --- > Changes: > v2: > * instead of relaxing the check, this simply removes it > --- > arch/powerpc/platforms/powernv/pci-ioda.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c > b/arch/powerpc/platforms/powernv/pci-ioda.c > index 1de94fb..433cf84 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -2776,7 +2776,7 @@ static long pnv_pci_ioda2_table_alloc_pages(int nid, > __u64 bus_offset, > if (!levels || (levels > POWERNV_IOMMU_MAX_LEVELS)) > return -EINVAL; > > - if ((window_size > memory_hotplug_max()) || !is_power_of_2(window_size)) > + if (!is_power_of_2(window_size)) > return -EINVAL; > > /* Adjust direct table size from window_size and levels */ -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [PATCH] ibmveth: Kernel crash LSO offload flag toggle
Hi Bryant, A few things: 1) The commit message could probably be trimmed, especially the stack traces. 2) What exactly are you changing and why does it fix the issue? I couldn't figure that out from the commit message. 3) Does this need a Fixes: tag? > } > > - netdev->min_mtu = IBMVETH_MIN_MTU; > - netdev->max_mtu = ETH_MAX_MTU; > - 4) What does the above hunk do? It seems unrelated... Regards, Daniel
Re: [PATCH kernel] vfio/spapr: Add trace points for map/unmap
On 27/10/17 14:00, Alexey Kardashevskiy wrote: > This adds trace_map/trace_unmap tracepoints to spapr driver. Type1 already > uses these via the IOMMU API (iommu_map/__iommu_unmap). > > Signed-off-by: Alexey KardashevskiyPing? > --- > > Example: > qemu-system-ppc-8655 [096] 724.662740: unmap:IOMMU: > iova=0x3000 size=4096 unmapped_size=4096 > qemu-system-ppc-8656 [104] 724.970912: map: IOMMU: > iova=0x0800 paddr=0x7ffef7ff size=65536 > --- > drivers/vfio/vfio_iommu_spapr_tce.c | 12 ++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c > b/drivers/vfio/vfio_iommu_spapr_tce.c > index 63112c36ab2d..4531486c77c6 100644 > --- a/drivers/vfio/vfio_iommu_spapr_tce.c > +++ b/drivers/vfio/vfio_iommu_spapr_tce.c > @@ -22,6 +22,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -502,17 +503,19 @@ static int tce_iommu_clear(struct tce_container > *container, > struct iommu_table *tbl, > unsigned long entry, unsigned long pages) > { > - unsigned long oldhpa; > + unsigned long oldhpa, unmapped, firstentry = entry, totalpages = pages; > long ret; > enum dma_data_direction direction; > > - for ( ; pages; --pages, ++entry) { > + for (unmapped = 0; pages; --pages, ++entry) { > direction = DMA_NONE; > oldhpa = 0; > ret = iommu_tce_xchg(tbl, entry, , ); > if (ret) > continue; > > + ++unmapped; > + > if (direction == DMA_NONE) > continue; > > @@ -523,6 +526,9 @@ static int tce_iommu_clear(struct tce_container > *container, > > tce_iommu_unuse_page(container, oldhpa); > } > + trace_unmap(firstentry << tbl->it_page_shift, > + totalpages << tbl->it_page_shift, > + unmapped << tbl->it_page_shift); > > return 0; > } > @@ -965,6 +971,8 @@ static long tce_iommu_ioctl(void *iommu_data, > direction); > > iommu_flush_tce(tbl); > + if (!ret) > + trace_map(param.iova, param.vaddr, param.size); > > return ret; > } > -- Alexey
[PATCH] ibmveth: Kernel crash LSO offload flag toggle
The following script when run (along with some iperf traffic recreates the crash within 5-10 mins or so). while true do ethtool -k ibmveth0 | grep tcp-segmentation-offload ethtool -K ibmveth0 tso off ethtool -k ibmveth0 | grep tcp-segmentation-offload ethtool -K ibmveth0 tso on done Note: This issue happens the very first time largsesend offload is turned off too (but the above script recreates the issue all the times) Stack trace output: [76563.914380] NIP [c0063940] memcpy_power7+0x40/0x800 [76563.914387] LR [d0d31788] ibmveth_start_xmit+0x1c8/0x8d0 [ibmveth] [76563.914392] Call Trace: [76563.914396] [c000feab3270] [c000feab32d0] 0xc000feab32d0 (unreliable) [76563.914407] [c000feab3360] [c09816f4] dev_hard_start_xmit+0x304/0x530 [76563.914415] [c000feab3440] [c09b6564] sch_direct_xmit+0x124/0x330 [76563.914423] [c000feab34e0] [c0981ddc] __dev_queue_xmit+0x26c/0x770 [76563.914431] [c000feab3580] [c0a1efc0] arp_xmit+0x30/0xd0 [76563.914438] [c000feab35f0] [c0a1f0f4] arp_send_dst.part.0+0x94/0xb0 [76563.914445] [c000feab3660] [c0a1fcb4] arp_solicit+0x114/0x2b0 [76563.914452] [c000feab3730] [c098d8f4] neigh_probe+0x84/0xd0 [76563.914460] [c000feab3760] [c09937cc] neigh_timer_handler+0xbc/0x320 [76563.914468] [c000feab37a0] [c014a3fc] call_timer_fn+0x5c/0x1c0 [76563.914474] [c000feab3830] [c014a8bc] run_timer_softirq+0x31c/0x3f0 [76563.914483] [c000feab3900] [c00bec58] __do_softirq+0x188/0x3e0 [76563.914490] [c000feab39f0] [c00bf128] irq_exit+0xc8/0x100 [76563.914498] [c000feab3a10] [c001f974] timer_interrupt+0xa4/0xe0 [76563.914505] [c000feab3a40] [c0002714] decrementer_common+0x114/0x180 Oops output: [76563.914173] Unable to handle kernel paging request for data at address 0x [76563.914197] Faulting instruction address: 0xc0063940 [76563.914205] Oops: Kernel access of bad area, sig: 11 [#1] [76563.914210] SMP NR_CPUS=2048 NUMA pSeries [76563.914217] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag nls_utf8 isofs binfmt_misc pseries_rng rtc_generic autofs4 ibmvfc scsi_transport_fc ibmvscsi ibmveth [76563.914251] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0-34-generic #53-Ubuntu [76563.914258] task: c000fe9efcc0 ti: c000feab task.ti: c000feab [76563.914265] NIP: c0063940 LR: d0d31788 CTR: c0064100 [76563.914271] REGS: c000feab2ff0 TRAP: 0300 Not tainted (4.4.0-34-generic) [76563.914277] MSR: 80009033CR: 4800284e XER: 001a [76563.914294] CFAR: c0008468 DAR: DSISR: 4200 SOFTE: 1 GPR00: f240 c000feab3270 c15b5d00 GPR04: cd9b0004 002a 0006 c000efc0ccac GPR08: d0d3dd28 0080 d0d34758 GPR12: c0064100 ce7f1c80 c000ffdca938 0100 GPR16: c000ffdca738 c000ffdca538 c000feab34a0 c15f4d00 GPR20: c15f4cf0 c000f5945900 GPR24: 8000 c000feab32d0 c000efc0ccac GPR28: c000f23ccb00 c000f5945000 c000f23ccb00 c000efc0cc00 [76563.914380] NIP [c0063940] memcpy_power7+0x40/0x800 [76563.914387] LR [d0d31788] ibmveth_start_xmit+0x1c8/0x8d0 [ibmveth] [76563.914392] Call Trace: [76563.914396] [c000feab3270] [c000feab32d0] 0xc000feab32d0 (unreliable) [76563.914407] [c000feab3360] [c09816f4] dev_hard_start_xmit+0x304/0x530 [76563.914415] [c000feab3440] [c09b6564] sch_direct_xmit+0x124/0x330 [76563.914423] [c000feab34e0] [c0981ddc] __dev_queue_xmit+0x26c/0x770 [76563.914431] [c000feab3580] [c0a1efc0] arp_xmit+0x30/0xd0 [76563.914438] [c000feab35f0] [c0a1f0f4] arp_send_dst.part.0+0x94/0xb0 [76563.914445] [c000feab3660] [c0a1fcb4] arp_solicit+0x114/0x2b0 [76563.914452] [c000feab3730] [c098d8f4] neigh_probe+0x84/0xd0 [76563.914460] [c000feab3760] [c09937cc] neigh_timer_handler+0xbc/0x320 [76563.914468] [c000feab37a0] [c014a3fc] call_timer_fn+0x5c/0x1c0 [76563.914474] [c000feab3830] [c014a8bc] run_timer_softirq+0x31c/0x3f0 [76563.914483] [c000feab3900] [c00bec58] __do_softirq+0x188/0x3e0 [76563.914490] [c000feab39f0] [c00bf128] irq_exit+0xc8/0x100 [76563.914498] [c000feab3a10] [c001f974] timer_interrupt+0xa4/0xe0 [76563.914505] [c000feab3a40] [c0002714] decrementer_common+0x114/0x180 [76563.914515] --- interrupt: 901 at plpar_hcall_norets+0x1c/0x28 [76563.914515] LR = check_and_cede_processor+0x34/0x50 [76563.914525]
[PATCH] [net-next, v4] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver
This patch implements and enables VDP support for the ibmvnic driver. Moreover, it includes the implementation of suitable structs, signal transmission/handling and functions which allows the retrival of firmware information from the ibmvnic card through the ethtool command. Signed-off-by: Desnes A. Nunes do RosarioSigned-off-by: Thomas Falcon --- drivers/net/ethernet/ibm/ibmvnic.c | 153 - drivers/net/ethernet/ibm/ibmvnic.h | 27 ++- 2 files changed, 176 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index b918bc2..04aaacb 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -574,6 +574,15 @@ static int reset_tx_pools(struct ibmvnic_adapter *adapter) return 0; } +static void release_vpd_data(struct ibmvnic_adapter *adapter) +{ + if (!adapter->vpd) + return; + + kfree(adapter->vpd->buff); + kfree(adapter->vpd); +} + static void release_tx_pools(struct ibmvnic_adapter *adapter) { struct ibmvnic_tx_pool *tx_pool; @@ -754,6 +763,8 @@ static void release_resources(struct ibmvnic_adapter *adapter) { int i; + release_vpd_data(adapter); + release_tx_pools(adapter); release_rx_pools(adapter); @@ -834,6 +845,57 @@ static int set_real_num_queues(struct net_device *netdev) return rc; } +static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter) +{ + struct device *dev = >vdev->dev; + union ibmvnic_crq crq; + dma_addr_t dma_addr; + int len = 0; + + if (adapter->vpd->buff) + len = adapter->vpd->len; + + reinit_completion(>fw_done); + crq.get_vpd_size.first = IBMVNIC_CRQ_CMD; + crq.get_vpd_size.cmd = GET_VPD_SIZE; + ibmvnic_send_crq(adapter, ); + wait_for_completion(>fw_done); + + if (!adapter->vpd->len) + return -ENODATA; + + if (!adapter->vpd->buff) + adapter->vpd->buff = kzalloc(adapter->vpd->len, GFP_KERNEL); + else if (adapter->vpd->len != len) + adapter->vpd->buff = + krealloc(adapter->vpd->buff, +adapter->vpd->len, GFP_KERNEL); + + if (!adapter->vpd->buff) { + dev_err(dev, "Could allocate VPD buffer\n"); + return -ENOMEM; + } + + adapter->vpd->dma_addr = + dma_map_single(dev, adapter->vpd->buff, adapter->vpd->len, + DMA_FROM_DEVICE); + if (dma_mapping_error(dev, dma_addr)) { + dev_err(dev, "Could not map VPD buffer\n"); + kfree(adapter->vpd->buff); + return -ENOMEM; + } + + reinit_completion(>fw_done); + crq.get_vpd.first = IBMVNIC_CRQ_CMD; + crq.get_vpd.cmd = GET_VPD; + crq.get_vpd.ioba = cpu_to_be32(adapter->vpd->dma_addr); + crq.get_vpd.len = cpu_to_be32((u32)adapter->vpd->len); + ibmvnic_send_crq(adapter, ); + wait_for_completion(>fw_done); + + return 0; +} + static int init_resources(struct ibmvnic_adapter *adapter) { struct net_device *netdev = adapter->netdev; @@ -851,6 +913,10 @@ static int init_resources(struct ibmvnic_adapter *adapter) if (rc) return rc; + adapter->vpd = kzalloc(sizeof(*adapter->vpd), GFP_KERNEL); + if (!adapter->vpd) + return -ENOMEM; + adapter->map_id = 1; adapter->napi = kcalloc(adapter->req_rx_queues, sizeof(struct napi_struct), GFP_KERNEL); @@ -924,7 +990,7 @@ static int __ibmvnic_open(struct net_device *netdev) static int ibmvnic_open(struct net_device *netdev) { struct ibmvnic_adapter *adapter = netdev_priv(netdev); - int rc; + int rc, vpd; mutex_lock(>reset_lock); @@ -951,6 +1017,12 @@ static int ibmvnic_open(struct net_device *netdev) rc = __ibmvnic_open(netdev); netif_carrier_on(netdev); + + /* Vital Product Data (VPD) */ + vpd = ibmvnic_get_vpd(adapter); + if (vpd) + netdev_err(netdev, "failed to initialize Vital Product Data (VPD)\n"); + mutex_unlock(>reset_lock); return rc; @@ -1879,11 +1951,15 @@ static int ibmvnic_get_link_ksettings(struct net_device *netdev, return 0; } -static void ibmvnic_get_drvinfo(struct net_device *dev, +static void ibmvnic_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *info) { + struct ibmvnic_adapter *adapter = netdev_priv(netdev); + strlcpy(info->driver, ibmvnic_driver_name, sizeof(info->driver)); strlcpy(info->version, IBMVNIC_DRIVER_VERSION, sizeof(info->version)); + strlcpy(info->fw_version, adapter->fw_version, + sizeof(info->fw_version)); }
Re: [PATCHv4 0/6] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers
On 10.11.2017 00:48, Sergey Senozhatsky wrote: All Ack-s/Tested-by-s were dropped, since the patch set has been reworked. I'm kindly asking arch-s maintainers and developers to test it once again. Sorry for any inconveniences and thanks for your help in advance. I tested it successfully on parisc64. You can add back the Tested-by: Helge Deller#parisc64 Thanks! Helge
Re: [PATCH v2] powerpc64/exceptions: Refactor code to eliminate a few memory loads
Michael Ellerman wrote: > "Naveen N. Rao"writes: > >> On 2017/06/19 03:21PM, Aneesh Kumar K.V wrote: >>> > @@ -1445,8 +1446,8 @@ do_hash_page: >>> > handle_page_fault: >>> > andis. r0,r4,DSISR_DABRMATCH@h >>> > bne-handle_dabr_fault >>> > - ld r4,_DAR(r1) >>> > - ld r5,_DSISR(r1) >>> > + mr r5,r4 >>> > + mr r4,r3 >>> > addir3,r1,STACK_FRAME_OVERHEAD >>> > bl do_page_fault >>> > cmpdi r3,0 >>> >>> >>> Can we avoid that if we rearrange args of other functions calls, so that >>> we can use r3 and r4 as it is ? >> >> Here's a version that does that. Again, boot tested with radix and >> disable_radix. >> >> Thanks, >> Naveen >> >> - >> Change data_access_common() and instruction_access_common() to load the >> trap number in r3, DAR in r4 and DSISR in r5 (rather than in r5, r3 and >> r4 respectively). This change allows us to eliminate a few un-necessary >> memory loads and register move operations in handle_page_fault(), >> handle_dabr_fault() and label '77'. >> >> Signed-off-by: Naveen N. Rao >> --- >> arch/powerpc/kernel/exceptions-64s.S | 38 >> +--- >> 1 file changed, 18 insertions(+), 20 deletions(-) > > Sorry I missed this and now it doesn't apply. Do you mind rebasing. No problem - this is just a small refactoring after all :). Here's a rebased version. Boot tested on mambo with/without radix. Thanks, Naveen -- [PATCH v3] powerpc/exceptions64s: Eliminate a few un-necessary memory loads Change data_access_common() and instruction_access_common() to load the trap number in r3, DAR in r4 and DSISR in r5 (rather than in r5, r3 and r4 respectively). This change allows us to eliminate a few un-necessary memory loads and register move operations in handle_page_fault(), handle_dabr_fault() and label '77'. Signed-off-by: Naveen N. Rao --- arch/powerpc/kernel/exceptions-64s.S | 38 +--- 1 file changed, 18 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 72cffa0c4af6..b8166b4fa728 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -499,11 +499,11 @@ EXC_COMMON_BEGIN(data_access_common) EXCEPTION_PROLOG_COMMON(0x300, PACA_EXGEN) RECONCILE_IRQ_STATE(r10, r11) ld r12,_MSR(r1) - ld r3,PACA_EXGEN+EX_DAR(r13) - lwz r4,PACA_EXGEN+EX_DSISR(r13) - li r5,0x300 - std r3,_DAR(r1) - std r4,_DSISR(r1) + ld r4,PACA_EXGEN+EX_DAR(r13) + lwz r5,PACA_EXGEN+EX_DSISR(r13) + li r3,0x300 + std r4,_DAR(r1) + std r5,_DSISR(r1) BEGIN_MMU_FTR_SECTION b do_hash_page/* Try to handle as hpte fault */ MMU_FTR_SECTION_ELSE @@ -543,11 +543,11 @@ EXC_COMMON_BEGIN(instruction_access_common) EXCEPTION_PROLOG_COMMON(0x400, PACA_EXGEN) RECONCILE_IRQ_STATE(r10, r11) ld r12,_MSR(r1) - ld r3,_NIP(r1) - andis. r4,r12,DSISR_BAD_FAULT_64S@h - li r5,0x400 - std r3,_DAR(r1) - std r4,_DSISR(r1) + ld r4,_NIP(r1) + andis. r5,r12,DSISR_BAD_FAULT_64S@h + li r3,0x400 + std r4,_DAR(r1) + std r5,_DSISR(r1) BEGIN_MMU_FTR_SECTION b do_hash_page/* Try to handle as hpte fault */ MMU_FTR_SECTION_ELSE @@ -1523,7 +1523,7 @@ do_hash_page: #ifdef CONFIG_PPC_BOOK3S_64 lis r0,DSISR_BAD_FAULT_64S@h ori r0,r0,DSISR_BAD_FAULT_64S@l - and.r0,r4,r0/* weird error? */ + and.r0,r5,r0/* weird error? */ bne-handle_page_fault /* if not, try to insert a HPTE */ CURRENT_THREAD_INFO(r11, r1) lwz r0,TI_PREEMPT(r11) /* If we're in an "NMI" */ @@ -1538,8 +1538,10 @@ do_hash_page: * * at return r3 = 0 for success, 1 for page fault, negative for error */ + mr r6,r5 + mr r5,r3 + mr r3,r4 mr r4,r12 - ld r6,_DSISR(r1) bl __hash_page /* build HPTE if possible */ cmpdi r3,0/* see if __hash_page succeeded */ @@ -1549,16 +1551,15 @@ do_hash_page: /* Error */ blt-13f - /* Reload DSISR into r4 for the DABR check below */ - ld r4,_DSISR(r1) + /* Reload DAR/DSISR for handle_page_fault */ + ld r4,_DAR(r1) + ld r5,_DSISR(r1) #endif /* CONFIG_PPC_BOOK3S_64 */ /* Here we have a page fault that hash_page can't handle. */ handle_page_fault: -11:andis. r0,r4,DSISR_DABRMATCH@h + andis. r0,r5,DSISR_DABRMATCH@h bne-handle_dabr_fault - ld r4,_DAR(r1) - ld r5,_DSISR(r1) addi
Re: linux-next: Tree for Nov 7
On 11/13/2017 09:06 AM, Michal Hocko wrote: OK, so this one should take care of the backward compatibility while still not touching the arch code --- commit 39ff9bf8597e79a032da0954aea1f0d77d137765 Author: Michal HockoDate: Mon Nov 13 17:06:24 2017 +0100 mm: introduce MAP_FIXED_SAFE MAP_FIXED is used quite often but it is inherently dangerous because it unmaps an existing mapping covered by the requested range. While this might be might be really desidered behavior in many cases there are others which would rather see a failure than a silent memory corruption. Introduce a new MAP_FIXED_SAFE flag for mmap to achive this behavior. It is a MAP_FIXED extension with a single exception that it fails with ENOMEM if the requested address is already covered by an existing mapping. We still do rely on get_unmaped_area to handle all the arch specific MAP_FIXED treatment and check for a conflicting vma after it returns. Signed-off-by: Michal Hocko .. deleted ... diff --git a/mm/mmap.c b/mm/mmap.c index 680506faceae..aad8d37f0205 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1358,6 +1358,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (mm->map_count > sysctl_max_map_count) return -ENOMEM; + /* force arch specific MAP_FIXED handling in get_unmapped_area */ + if (flags & MAP_FIXED_SAFE) + flags |= MAP_FIXED; + /* Obtain the address to map to. we verify (or select) it and ensure * that it represents a valid section of the address space. */ Do you need to move this code above: if (!(flags & MAP_FIXED)) addr = round_hint_to_min(addr); /* Careful about overflows.. */ len = PAGE_ALIGN(len); if (!len) return -ENOMEM; Not doing that might mean the hint address will end up being rounded for MAP_FIXED_SAFE which would change the behavior from MAP_FIXED. -- Khalid
Re: linux-next: Tree for Nov 7
[Sorry for spamming, this one is the last attempt hopefully] On Mon 13-11-17 16:49:39, Michal Hocko wrote: > On Mon 13-11-17 16:16:41, Michal Hocko wrote: > > On Mon 13-11-17 13:00:57, Michal Hocko wrote: > > [...] > > > Yes, I have mentioned that in the previous email but the amount of code > > > would be even larger. Basically every arch which reimplements > > > arch_get_unmapped_area would have to special case new MAP_FIXED flag to > > > do vma lookup. > > > > It turned out that this might be much more easier than I thought after > > all. It seems we can really handle that in the common code. This would > > mean that we are exposing a new functionality to the userspace though. > > Myabe this would be useful on its own though. Just a quick draft (not > > even compile tested) whether this makes sense in general. I would be > > worried about unexpected behavior when somebody set other bit without a > > good reason and we might fail with ENOMEM for such a call now. > > Hmm, the bigger problem would be the backward compatibility actually. We > would get silent corruptions which is exactly what the flag is trying > fix. mmap flags handling really sucks. So I guess we would have to make > the flag internal only :/ OK, so this one should take care of the backward compatibility while still not touching the arch code --- commit 39ff9bf8597e79a032da0954aea1f0d77d137765 Author: Michal HockoDate: Mon Nov 13 17:06:24 2017 +0100 mm: introduce MAP_FIXED_SAFE MAP_FIXED is used quite often but it is inherently dangerous because it unmaps an existing mapping covered by the requested range. While this might be might be really desidered behavior in many cases there are others which would rather see a failure than a silent memory corruption. Introduce a new MAP_FIXED_SAFE flag for mmap to achive this behavior. It is a MAP_FIXED extension with a single exception that it fails with ENOMEM if the requested address is already covered by an existing mapping. We still do rely on get_unmaped_area to handle all the arch specific MAP_FIXED treatment and check for a conflicting vma after it returns. Signed-off-by: Michal Hocko diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h index 3b26cc62dadb..767bcb8a4c28 100644 --- a/arch/alpha/include/uapi/asm/mman.h +++ b/arch/alpha/include/uapi/asm/mman.h @@ -31,6 +31,8 @@ #define MAP_STACK 0x8 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB0x10/* create a huge page mapping */ +#define MAP_FIXED_SAFE 0x200 /* MAP_FIXED which doesn't unmap underlying mapping */ + #define MS_ASYNC 1 /* sync memory asynchronously */ #define MS_SYNC2 /* synchronous memory sync */ #define MS_INVALIDATE 4 /* invalidate the caches */ diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index da3216007fe0..c2311eb7219b 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -49,6 +49,8 @@ #define MAP_STACK 0x4 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB0x8 /* create a huge page mapping */ +#define MAP_FIXED_SAFE 0x200 /* MAP_FIXED which doesn't unmap underlying mapping */ + /* * Flags for msync */ diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index cc9ba1d34779..b06fd830bc6f 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -25,6 +25,8 @@ #define MAP_STACK 0x4 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB0x8 /* create a huge page mapping */ +#define MAP_FIXED_SAFE 0x200 /* MAP_FIXED which doesn't unmap underlying mapping */ + #define MS_SYNC1 /* synchronous memory sync */ #define MS_ASYNC 2 /* sync memory asynchronously */ #define MS_INVALIDATE 4 /* invalidate the caches */ diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index b15b278aa314..f4b291bca764 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -62,6 +62,8 @@ # define MAP_UNINITIALIZED 0x0 /* Don't support this flag */ #endif +#define MAP_FIXED_SAFE 0x200 /* MAP_FIXED which doesn't unmap underlying mapping */ + /* * Flags for msync */ diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 203268f9231e..03c518777f83 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -25,6 +25,8 @@ # define MAP_UNINITIALIZED 0x0 /* Don't support this flag */ #endif
Re: linux-next: Tree for Nov 7
On Mon 13-11-17 15:48:13, Russell King - ARM Linux wrote: > On Mon, Nov 13, 2017 at 04:16:41PM +0100, Michal Hocko wrote: > > On Mon 13-11-17 13:00:57, Michal Hocko wrote: > > [...] > > > Yes, I have mentioned that in the previous email but the amount of code > > > would be even larger. Basically every arch which reimplements > > > arch_get_unmapped_area would have to special case new MAP_FIXED flag to > > > do vma lookup. > > > > It turned out that this might be much more easier than I thought after > > all. It seems we can really handle that in the common code. This would > > mean that we are exposing a new functionality to the userspace though. > > Myabe this would be useful on its own though. Just a quick draft (not > > even compile tested) whether this makes sense in general. I would be > > worried about unexpected behavior when somebody set other bit without a > > good reason and we might fail with ENOMEM for such a call now. > > > > Elf loader would then use MAP_FIXED_SAFE rather than MAP_FIXED. > > --- > > diff --git a/arch/alpha/include/uapi/asm/mman.h > > b/arch/alpha/include/uapi/asm/mman.h > > index 3b26cc62dadb..d021c21f9b01 100644 > > --- a/arch/alpha/include/uapi/asm/mman.h > > +++ b/arch/alpha/include/uapi/asm/mman.h > > @@ -31,6 +31,9 @@ > > #define MAP_STACK 0x8 /* give out an address that is best > > suited for process/thread stacks */ > > #define MAP_HUGETLB0x10/* create a huge page mapping */ > > > > +#define MAP_KEEP_MAPPING 0x200 > > +#define MAP_FIXED_SAFE MAP_FIXED|MAP_KEEP_MAPPING /* enforce MAP_FIXED > > without clobbering an existing mapping */ > > A few things... > > 1. Does this need to be exposed to userland? As I've written in another email, exposing the flag this way would be really dangerous wrt. backward compatibility. So we would either need some translation or make it a flag on its own and touch the arch specific code which I really wanted to prevent from. Whether this is something useful for the userspace is a separate question which I should bring up to linux-api for a wider audience to discuss. So I guess this goes down to whether we want/need something like MAP_FIXED_SAFE or opt out the specific hardening code for arches that cannot make unaligned mappings for the requested address. > 2. Can it end up in include/uapi/asm-generic/mman*.h ? > 3. The definition of MAP_FIXED_SAFE should really have parens around it. Of course. I thought I did... > > @@ -1365,6 +1365,13 @@ unsigned long do_mmap(struct file *file, unsigned > > long addr, > > if (offset_in_page(addr)) > > return addr; > > > > + if ((flags & MAP_FIXED_SAFE) == MAP_FIXED_SAFE) { > > I'm surprised this doesn't warn - since this effectively expands to: > > flags & MAP_FIXED | MAP_KEEP_MAPPING > > hence why MAP_FIXED_SAFE needs parens. It sure does. Thanks! -- Michal Hocko SUSE Labs
Re: linux-next: Tree for Nov 7
On Mon 13-11-17 16:16:41, Michal Hocko wrote: > On Mon 13-11-17 13:00:57, Michal Hocko wrote: > [...] > > Yes, I have mentioned that in the previous email but the amount of code > > would be even larger. Basically every arch which reimplements > > arch_get_unmapped_area would have to special case new MAP_FIXED flag to > > do vma lookup. > > It turned out that this might be much more easier than I thought after > all. It seems we can really handle that in the common code. This would > mean that we are exposing a new functionality to the userspace though. > Myabe this would be useful on its own though. Just a quick draft (not > even compile tested) whether this makes sense in general. I would be > worried about unexpected behavior when somebody set other bit without a > good reason and we might fail with ENOMEM for such a call now. Hmm, the bigger problem would be the backward compatibility actually. We would get silent corruptions which is exactly what the flag is trying fix. mmap flags handling really sucks. So I guess we would have to make the flag internal only :/ -- Michal Hocko SUSE Labs
Re: linux-next: Tree for Nov 7
On Mon, Nov 13, 2017 at 04:16:41PM +0100, Michal Hocko wrote: > On Mon 13-11-17 13:00:57, Michal Hocko wrote: > [...] > > Yes, I have mentioned that in the previous email but the amount of code > > would be even larger. Basically every arch which reimplements > > arch_get_unmapped_area would have to special case new MAP_FIXED flag to > > do vma lookup. > > It turned out that this might be much more easier than I thought after > all. It seems we can really handle that in the common code. This would > mean that we are exposing a new functionality to the userspace though. > Myabe this would be useful on its own though. Just a quick draft (not > even compile tested) whether this makes sense in general. I would be > worried about unexpected behavior when somebody set other bit without a > good reason and we might fail with ENOMEM for such a call now. > > Elf loader would then use MAP_FIXED_SAFE rather than MAP_FIXED. > --- > diff --git a/arch/alpha/include/uapi/asm/mman.h > b/arch/alpha/include/uapi/asm/mman.h > index 3b26cc62dadb..d021c21f9b01 100644 > --- a/arch/alpha/include/uapi/asm/mman.h > +++ b/arch/alpha/include/uapi/asm/mman.h > @@ -31,6 +31,9 @@ > #define MAP_STACK0x8 /* give out an address that is best > suited for process/thread stacks */ > #define MAP_HUGETLB 0x10/* create a huge page mapping */ > > +#define MAP_KEEP_MAPPING 0x200 > +#define MAP_FIXED_SAFE MAP_FIXED|MAP_KEEP_MAPPING /* enforce MAP_FIXED > without clobbering an existing mapping */ A few things... 1. Does this need to be exposed to userland? 2. Can it end up in include/uapi/asm-generic/mman*.h ? 3. The definition of MAP_FIXED_SAFE should really have parens around it. > @@ -1365,6 +1365,13 @@ unsigned long do_mmap(struct file *file, unsigned long > addr, > if (offset_in_page(addr)) > return addr; > > + if ((flags & MAP_FIXED_SAFE) == MAP_FIXED_SAFE) { I'm surprised this doesn't warn - since this effectively expands to: flags & MAP_FIXED | MAP_KEEP_MAPPING hence why MAP_FIXED_SAFE needs parens. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: linux-next: Tree for Nov 7
On Mon 13-11-17 15:09:09, Russell King - ARM Linux wrote: > On Mon, Nov 13, 2017 at 03:11:40PM +0100, Michal Hocko wrote: > > On Mon 13-11-17 10:20:06, Michal Hocko wrote: > > > [Cc arm and ppc maintainers] > > > > > > Thanks a lot for testing! > > > > > > On Sun 12-11-17 11:38:02, Joel Stanley wrote: > > > > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko> > > > wrote: > > > > > Hi Joel, > > > > > > > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > > > > [...] > > > > >> > There are a lot of messages on the way up that look like this: > > > > >> > > > > > >> > [2.527460] Uhuuh, elf segement at 000d9000 requested but the > > > > >> > memory is mapped already > > > > >> > [2.540160] Uhuuh, elf segement at 000d9000 requested but the > > > > >> > memory is mapped already > > > > >> > [2.546153] Uhuuh, elf segement at 000d9000 requested but the > > > > >> > memory is mapped already > > > > >> > > > > > >> > And then trying to run userspace looks like this: > > > > >> > > > > >> Could you please run with debugging patch posted > > > > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63k...@dhcp22.suse.cz > > > > > > > > > > Did you have chance to test with this debugging patch, please? > > > > > > > > Lots of this: > > > > > > > > [1.177266] Uhuuh, elf segement at 000d9000 requested but the > > > > memory is mapped already, got 000dd000 > > > > [1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) > > > > > > This smells like the problem I've expected that mmap with hint doesn't > > > respect the hint even though there is no clashing mapping. The above > > > basically says that we didn't map at 0xd9000 but it has placed it at > > > 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new > > > mapping. find_vma returns the closest vma (with addr < vm_end) for the > > > given address 0xd9000 so this address cannot be mapped by any other vma. > > > > > > Now that I am looking at arm's arch_get_unmapped_area it does perform > > > aligning for shared vmas. > > > > Sorry for confusion here. These are not shared mappings as pointed out > > by Russell in a private email. I got confused by the above flags which I > > have misinterpreted as bit 0 set => MAP_SHARED. These are vm_flags > > obviously so the bit 0 is VM_READ. Sorry about the confusion. The real > > reason we are doing the alignment is that we do a file mapping > > /* > > * We only need to do colour alignment if either the I or D > > * caches alias. > > */ > > if (aliasing) > > do_align = filp || (flags & MAP_SHARED); > > > > I am not really familiar with this architecture to understand why do we > > need aliasing for file mappings, though. > > I think it's there so that flush_dcache_page() works - possibly > get_user_pages() being used on a private mapping of page cache pages, > but that's guessing. I fail to see how the mixure of MAP_FIXED and regular mapping of the same file work then, but as I've said I really do not understand this code. > I'm afraid I don't remember all the details, this is code from around > 15 years ago, and I'd be very nervous about changing it now without > fully understanding the issues. Ohh, absolutely! I didn't dare to touch this code and that's why I took the easy way and simply opt-out from the harding for all those archs that are basically sharing this pattern. But after a closer look it seems that we can really introduce MAP_FIXED_SAFE that would keep the arch mmap code intact yet we would get the hardening for all archs. It would allow also allow a safer MAP_FIXED semantic for userspace. -- Michal Hocko SUSE Labs
Re: powerpc-linux-gnu-ld: warning: orphan section `.data.rel.ro' from `arch/powerpc/kernel/head_44x.o' being placed in section `.data.rel.ro'.
Hi Nicholas, 2017-11-11 17:45 GMT+09:00 Nicholas Piggin: > On Fri, 10 Nov 2017 23:41:49 +0800 > kbuild test robot wrote: > >> tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >> master >> head: 1c9dbd4615fd751e5e0b99807a3c7c8612e28e20 >> commit: cb87481ee89dbd6609e227afbf64900fb4e5c930 kbuild: linker script do >> not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured >> date: 3 months ago >> config: powerpc-fsp2_defconfig (attached as .config) >> compiler: powerpc-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0 >> reproduce: >> wget >> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O >> ~/bin/make.cross >> chmod +x ~/bin/make.cross >> git checkout cb87481ee89dbd6609e227afbf64900fb4e5c930 >> # save the attached .config to linux build tree >> make.cross ARCH=powerpc >> >> All warnings (new ones prefixed by >>): >> >> >> powerpc-linux-gnu-ld: warning: orphan section `.data.rel.ro' from >> >> `arch/powerpc/kernel/head_44x.o' being placed in section `.data.rel.ro'. >> >> powerpc-linux-gnu-ld: warning: orphan section `.data.rel.ro' from >> >> `arch/powerpc/kernel/head_44x.o' being placed in section `.data.rel.ro'. >> >> powerpc-linux-gnu-ld: warning: orphan section `.data.rel.ro' from >> >> `arch/powerpc/kernel/head_44x.o' being placed in section `.data.rel.ro'. > > Okay this is not caused by the above patch, it was just hiding it. > This should do the trick I think: > -- > powerpc/32: Add .data.rel* sections explicitly > > Match powerpc/64 and include .data.rel* input sections in the .data output > section explicitly. > > This should solve the warning: > > powerpc-linux-gnu-ld: warning: orphan section `.data.rel.ro' from > `arch/powerpc/kernel/head_44x.o' being placed in section `.data.rel.ro'. > > Reported-by: kbuild test robot > Signed-off-by: Nicholas Piggin I think this will go to the PPC tree. I will not touch this patch, but if necessary, please feel free to add Reviewed-by: Masahiro Yamada > --- > arch/powerpc/kernel/vmlinux.lds.S | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/powerpc/kernel/vmlinux.lds.S > b/arch/powerpc/kernel/vmlinux.lds.S > index 0494e1566ee2..51e4ec92ade1 100644 > --- a/arch/powerpc/kernel/vmlinux.lds.S > +++ b/arch/powerpc/kernel/vmlinux.lds.S > @@ -264,6 +264,7 @@ SECTIONS > #ifdef CONFIG_PPC32 > .data : AT(ADDR(.data) - LOAD_OFFSET) { > DATA_DATA > + *(.data.rel*) > *(.sdata) > *(.sdata2) > *(.got.plt) *(.got) > -- > 2.15.0 > -- Best Regards Masahiro Yamada
Re: linux-next: Tree for Nov 7
On Mon 13-11-17 13:00:57, Michal Hocko wrote: [...] > Yes, I have mentioned that in the previous email but the amount of code > would be even larger. Basically every arch which reimplements > arch_get_unmapped_area would have to special case new MAP_FIXED flag to > do vma lookup. It turned out that this might be much more easier than I thought after all. It seems we can really handle that in the common code. This would mean that we are exposing a new functionality to the userspace though. Myabe this would be useful on its own though. Just a quick draft (not even compile tested) whether this makes sense in general. I would be worried about unexpected behavior when somebody set other bit without a good reason and we might fail with ENOMEM for such a call now. Elf loader would then use MAP_FIXED_SAFE rather than MAP_FIXED. --- diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h index 3b26cc62dadb..d021c21f9b01 100644 --- a/arch/alpha/include/uapi/asm/mman.h +++ b/arch/alpha/include/uapi/asm/mman.h @@ -31,6 +31,9 @@ #define MAP_STACK 0x8 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB0x10/* create a huge page mapping */ +#define MAP_KEEP_MAPPING 0x200 +#define MAP_FIXED_SAFE MAP_FIXED|MAP_KEEP_MAPPING /* enforce MAP_FIXED without clobbering an existing mapping */ + #define MS_ASYNC 1 /* sync memory asynchronously */ #define MS_SYNC2 /* synchronous memory sync */ #define MS_INVALIDATE 4 /* invalidate the caches */ diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index da3216007fe0..51e3885fbfc1 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -49,6 +49,9 @@ #define MAP_STACK 0x4 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB0x8 /* create a huge page mapping */ +#define MAP_KEEP_MAPPING 0x200 +#define MAP_FIXED_SAFE MAP_FIXED|MAP_KEEP_MAPPING /* enforce MAP_FIXED without clobbering an existing mapping */ + /* * Flags for msync */ diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index cc9ba1d34779..5a4381484fc5 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -25,6 +25,9 @@ #define MAP_STACK 0x4 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB0x8 /* create a huge page mapping */ +#define MAP_KEEP_MAPPING 0x200 +#define MAP_FIXED_SAFE MAP_FIXED|MAP_KEEP_MAPPING /* enforce MAP_FIXED without clobbering an existing mapping */ + #define MS_SYNC1 /* synchronous memory sync */ #define MS_ASYNC 2 /* sync memory asynchronously */ #define MS_INVALIDATE 4 /* invalidate the caches */ diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index b15b278aa314..5df8a81524da 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -62,6 +62,9 @@ # define MAP_UNINITIALIZED 0x0 /* Don't support this flag */ #endif +#define MAP_KEEP_MAPPING 0x200 +#define MAP_FIXED_SAFE MAP_FIXED|MAP_KEEP_MAPPING /* enforce MAP_FIXED without clobbering an existing mapping */ + /* * Flags for msync */ diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 203268f9231e..22442846f5c8 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -25,6 +25,9 @@ # define MAP_UNINITIALIZED 0x0 /* Don't support this flag */ #endif +#define MAP_KEEP_MAPPING 0x200 +#define MAP_FIXED_SAFE MAP_FIXED|MAP_KEEP_MAPPING /* enforce MAP_FIXED without clobbering an existing mapping */ + /* * Flags for mlock */ diff --git a/mm/mmap.c b/mm/mmap.c index 680506faceae..e53b6b15a8d9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1365,6 +1365,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (offset_in_page(addr)) return addr; + if ((flags & MAP_FIXED_SAFE) == MAP_FIXED_SAFE) { + struct vm_area_struct *vma = find_vma(mm, addr); + + if (vma && vma->vm_start <= addr) + return -ENOMEM; + } + if (prot == PROT_EXEC) { pkey = execute_only_pkey(mm); if (pkey < 0) -- Michal Hocko SUSE Labs
Re: linux-next: Tree for Nov 7
On Mon, Nov 13, 2017 at 03:11:40PM +0100, Michal Hocko wrote: > On Mon 13-11-17 10:20:06, Michal Hocko wrote: > > [Cc arm and ppc maintainers] > > > > Thanks a lot for testing! > > > > On Sun 12-11-17 11:38:02, Joel Stanley wrote: > > > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hockowrote: > > > > Hi Joel, > > > > > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > > > [...] > > > >> > There are a lot of messages on the way up that look like this: > > > >> > > > > >> > [2.527460] Uhuuh, elf segement at 000d9000 requested but the > > > >> > memory is mapped already > > > >> > [2.540160] Uhuuh, elf segement at 000d9000 requested but the > > > >> > memory is mapped already > > > >> > [2.546153] Uhuuh, elf segement at 000d9000 requested but the > > > >> > memory is mapped already > > > >> > > > > >> > And then trying to run userspace looks like this: > > > >> > > > >> Could you please run with debugging patch posted > > > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63k...@dhcp22.suse.cz > > > > > > > > Did you have chance to test with this debugging patch, please? > > > > > > Lots of this: > > > > > > [1.177266] Uhuuh, elf segement at 000d9000 requested but the memory > > > is mapped already, got 000dd000 > > > [1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) > > > > This smells like the problem I've expected that mmap with hint doesn't > > respect the hint even though there is no clashing mapping. The above > > basically says that we didn't map at 0xd9000 but it has placed it at > > 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new > > mapping. find_vma returns the closest vma (with addr < vm_end) for the > > given address 0xd9000 so this address cannot be mapped by any other vma. > > > > Now that I am looking at arm's arch_get_unmapped_area it does perform > > aligning for shared vmas. > > Sorry for confusion here. These are not shared mappings as pointed out > by Russell in a private email. I got confused by the above flags which I > have misinterpreted as bit 0 set => MAP_SHARED. These are vm_flags > obviously so the bit 0 is VM_READ. Sorry about the confusion. The real > reason we are doing the alignment is that we do a file mapping > /* >* We only need to do colour alignment if either the I or D >* caches alias. >*/ > if (aliasing) > do_align = filp || (flags & MAP_SHARED); > > I am not really familiar with this architecture to understand why do we > need aliasing for file mappings, though. I think it's there so that flush_dcache_page() works - possibly get_user_pages() being used on a private mapping of page cache pages, but that's guessing. I'm afraid I don't remember all the details, this is code from around 15 years ago, and I'd be very nervous about changing it now without fully understanding the issues. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [PATCH] [net-next,v3] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver
On 11/10/2017 01:13 PM, Desnes Augusto Nunes do Rosário wrote: > > > On 11/10/2017 12:54 PM, Nathan Fontenot wrote: >> On 11/10/2017 08:41 AM, Desnes Augusto Nunes do Rosário wrote: >>> >>> >>> On 11/09/2017 06:31 PM, Nathan Fontenot wrote: On 11/09/2017 01:00 PM, Desnes Augusto Nunes do Rosario wrote: > This patch implements and enables VDP support for the ibmvnic driver. > Moreover, it includes the implementation of suitable structs, signal > transmission/handling and functions which allows the retrival of > firmware > information from the ibmvnic card through the ethtool command. > > Signed-off-by: Desnes A. Nunes do Rosario> Signed-off-by: Thomas Falcon > --- > drivers/net/ethernet/ibm/ibmvnic.c | 149 > - > drivers/net/ethernet/ibm/ibmvnic.h | 27 ++- > 2 files changed, 173 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c > b/drivers/net/ethernet/ibm/ibmvnic.c > index d0cff28..693b502 100644 > --- a/drivers/net/ethernet/ibm/ibmvnic.c > +++ b/drivers/net/ethernet/ibm/ibmvnic.c > @@ -573,6 +573,15 @@ static int reset_tx_pools(struct ibmvnic_adapter > *adapter) > return 0; > } > > +static void release_vpd_data(struct ibmvnic_adapter *adapter) > +{ > + if (!adapter->vpd) > + return; > + > + kfree(adapter->vpd->buff); > + kfree(adapter->vpd); > +} > + > static void release_tx_pools(struct ibmvnic_adapter *adapter) > { > struct ibmvnic_tx_pool *tx_pool; > @@ -753,6 +762,8 @@ static void release_resources(struct ibmvnic_adapter > *adapter) > { > int i; > > + release_vpd_data(adapter); > + > release_tx_pools(adapter); > release_rx_pools(adapter); > > @@ -833,6 +844,53 @@ static int set_real_num_queues(struct net_device > *netdev) > return rc; > } > > +static int ibmvnic_get_vpd(struct ibmvnic_adapter *adapter) > +{ > + struct device *dev = >vdev->dev; > + union ibmvnic_crq crq; > + dma_addr_t dma_addr; > + int len; > + > + if (adapter->vpd->buff) > + len = adapter->vpd->len; > + > + reinit_completion(>fw_done); > + crq.get_vpd_size.first = IBMVNIC_CRQ_CMD; > + crq.get_vpd_size.cmd = GET_VPD_SIZE; > + ibmvnic_send_crq(adapter, ); > + wait_for_completion(>fw_done); > + Shouldn't there be a check for the return code when getting the vpd size? >>> >>> Hello Nathan, >>> >>> This check is already being performed on the handle_vpd_size_rsp() function >>> down below. >>> >>> In short, a GET_VPD_SIZE signal is sent here through a ibmvnic_crq union in >>> ibmvnic_send_crq(), whereas handle_query_ip_offload_rsp() receives from the >>> VNIC adapter a GET_VPD_SIZE_RSP containing a ibmvnic_crq union with the vpd >>> size information and the rc.code. If successful, a >fw_done is >>> sent and this part of the code continues; however if not, a dev_error() is >>> thrown. Same logic applies to GET_VPD/GET_VPD_RSP. >>> >> >> Yes, I did see that code. You do a complet of the completion variable for >> both success and failure, >> this then lets this routine continue irregardless of the results of the get >> vpd size request. The >> call to dev_err will print the error message but does not prevent use from >> bailing if the >> get vpd size fails. Perhaps setting vpd->len to -1 to indicate the get vpd >> call failed which could >> then be checked by the requester. >> >> -Nathan >> >> >> What I am adding on the next version of the patch is a check if > adapter->vpd->len is different than 0 before allocating adapter->vpd->buff, > since that in a case of a failure, adapter->vpd->len will be 0. > > I do concur with your observation that the break is necessary. > > If the reception of vpd failed, adapter->vpd->len will be still zeroed out > since it was created with kzalloc in init_resources(). > > Thus, do you agree if in the next version I send the following code? Yes, this would be good. I think you should also add an explicit setting of len to 0 before the call too. Looking at the code before the get cpd size call, if a vpd buffer already exists then the len will be non-zero. -Nathan > > === > + reinit_completion(>fw_done); > + crq.get_vpd_size.first = IBMVNIC_CRQ_CMD; > + crq.get_vpd_size.cmd = GET_VPD_SIZE; > + ibmvnic_send_crq(adapter, ); > + wait_for_completion(>fw_done); > + > ->+ if(!adapter->vpd->len) > ->+ return -ENODATA; > + > + if (!adapter->vpd->buff) > + adapter->vpd->buff = kzalloc(adapter->vpd->len, GFP_KERNEL); > + else if (adapter->vpd->len != len) >
Re: [PATCH v2 3/8] powerpc/mm: Separate ibm, dynamic-memory data from DT format
On 11/12/2017 06:43 AM, Michael Ellerman wrote: > Hi Nathan, > > Nathan Fontenotwrites: >> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c >> index f83056297441..917184c13890 100644 >> --- a/arch/powerpc/kernel/prom.c >> +++ b/arch/powerpc/kernel/prom.c >> @@ -454,92 +455,93 @@ static int __init >> early_init_dt_scan_chosen_ppc(unsigned long node, > ... >> >> static int __init early_init_dt_scan_memory_ppc(unsigned long node, >> const char *uname, >> int depth, void *data) >> { >> +int rc; >> + >> if (depth == 1 && >> -strcmp(uname, "ibm,dynamic-reconfiguration-memory") == 0) >> -return early_init_dt_scan_drconf_memory(node); >> +strcmp(uname, "ibm,dynamic-reconfiguration-memory") == 0) { >> +rc = init_drmem_lmbs(node); >> +if (!rc) >> +early_init_dt_scan_drmem_lmbs(node); >> + >> +return rc; >> +} >> >> return early_init_dt_scan_memory(node, uname, depth, data); >> } > > There's one bug in here which is that you return rc as returned by > init_drmem_lmbs(). Returning non-zero from these scan routines > terminates the scan, which means if anything goes wrong in > init_drmem_lmbs() we may not call early_init_dt_scan_memory() > in which case we won't have any memory at all. > I didn't know this would stop scanning the device tree, thanks for letting me know. > I say "may not" because it depends on the order of the nodes in the > device tree whether you hit the memory nodes or the dynamic reconfig mem > info first. And the order of the nodes in the device tree is arbitrary > so we can't rely on that. > > >> diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c >> new file mode 100644 >> index ..8ad7cf36b2c4 >> --- /dev/null >> +++ b/arch/powerpc/mm/drmem.c >> @@ -0,0 +1,84 @@ > ... >> + >> +int __init init_drmem_lmbs(unsigned long node) >> +{ >> +struct drmem_lmb *lmb; >> +const __be32 *prop; >> +int prop_sz; >> +u32 len; >> + >> +prop = of_get_flat_dt_prop(node, "ibm,lmb-size", ); >> +if (!prop || len < dt_root_size_cells * sizeof(__be32)) >> +return -1; >> + >> +drmem_info->lmb_size = dt_mem_next_cell(dt_root_size_cells, ); >> + >> +prop = of_get_flat_dt_prop(node, "ibm,dynamic-memory", ); >> +if (!prop || len < dt_root_size_cells * sizeof(__be32)) >> +return -1; >> + >> +drmem_info->n_lmbs = of_read_number(prop++, 1); >> +prop_sz = drmem_info->n_lmbs * sizeof(struct of_drconf_cell) >> + + sizeof(__be32); >> +if (prop_sz < len) >> +return -1; >> + >> +drmem_info->lmbs = alloc_bootmem(drmem_info->n_lmbs * sizeof(*lmb)); >> +if (!drmem_info->lmbs) >> +return -1; > > The bigger problem we have though is that you're trying to allocate > memory, in order to find out what memory we have :) > > I suspect it works in some cases because you hit the memory@0 node first > in the device tree, and add that memory to memblock, which means > init_drmem_lmbs() *can* allocate memory, and everything's good. > > But if we hit init_drmem_lmbs() first, or there's not enough space in > memory@0, then allocating memory in order to discover memory is not > going to work. > > I'm not sure what the best solution is. One option would be to > statically allocate some space, so that we can discover some of the LMBs > without doing an allocation. But we wouldn't be able to guarantee that > we had enough space i nthat static allocation, so the code would need to > handle doing that and then potentially finding more LMBs later using a > dynamic alloc. So that could be a bit messy. > > The other option would be for the early_init_dt_scan_drmem_lmbs() code > to still work on the device tree directly, rather than using the > drmem_info array. That would make for uglier code, but may be necessary. > I have been thinking about my initial approach, and the more I look at it the more I do not like trying to do the bootmem allocation. As you mention there is just too much that could go wrong with that. I have started looking at a design where an interface similar to walk_memory_range() is used for the prom and numa code so we do not have to rely on making the allocation for the lmb array early in boot. The lmb array could then be allocated in the late_initcall in drmem.c at which point the general kernel allocator is available. I'm still working on getting this coded up and when send out a new patch set once it's ready unless anyone has objections to this approach. -Nathan
Re: [PATCH v2 2/2] powerpc/pci: Unroll two pass loop when scanning bridges
On Fri, Nov 10, 2017 at 07:52:30PM +0200, Andy Shevchenko wrote: > The current scanning code is really hard to understand because it calls > the same function in a loop where pass value is changed without any > comments explaining it: > > for (pass = 0; pass < 2; pass++) > for_each_pci_bridge(dev, bus) > max = pci_scan_bridge(bus, dev, max, pass); > > Unfamiliar reader cannot tell easily what is the purpose of this loop > without looking at internals of pci_scan_bridge(). > > In order to make this bit easier to understand, open-code the loop in > pci_scan_child_bus() and pci_hp_add_bridge() with added comments. > > No functional changes intended. > > Cc: Mika WesterbergReviewed-by: Mika Westerberg
Re: linux-next: Tree for Nov 7
On Mon 13-11-17 10:20:06, Michal Hocko wrote: > [Cc arm and ppc maintainers] > > Thanks a lot for testing! > > On Sun 12-11-17 11:38:02, Joel Stanley wrote: > > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hockowrote: > > > Hi Joel, > > > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > > [...] > > >> > There are a lot of messages on the way up that look like this: > > >> > > > >> > [2.527460] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > [2.540160] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > [2.546153] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > > > >> > And then trying to run userspace looks like this: > > >> > > >> Could you please run with debugging patch posted > > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63k...@dhcp22.suse.cz > > > > > > Did you have chance to test with this debugging patch, please? > > > > Lots of this: > > > > [1.177266] Uhuuh, elf segement at 000d9000 requested but the memory is > > mapped already, got 000dd000 > > [1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) > > This smells like the problem I've expected that mmap with hint doesn't > respect the hint even though there is no clashing mapping. The above > basically says that we didn't map at 0xd9000 but it has placed it at > 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new > mapping. find_vma returns the closest vma (with addr < vm_end) for the > given address 0xd9000 so this address cannot be mapped by any other vma. > > Now that I am looking at arm's arch_get_unmapped_area it does perform > aligning for shared vmas. Sorry for confusion here. These are not shared mappings as pointed out by Russell in a private email. I got confused by the above flags which I have misinterpreted as bit 0 set => MAP_SHARED. These are vm_flags obviously so the bit 0 is VM_READ. Sorry about the confusion. The real reason we are doing the alignment is that we do a file mapping /* * We only need to do colour alignment if either the I or D * caches alias. */ if (aliasing) do_align = filp || (flags & MAP_SHARED); I am not really familiar with this architecture to understand why do we need aliasing for file mappings, though. -- Michal Hocko SUSE Labs
[PATCH] powerpc/32: Add .data.rel* sections explicitly
Match powerpc/64 and include .data.rel* input sections in the .data output section explicitly. This solves the warning: powerpc-linux-gnu-ld: warning: orphan section `.data.rel.ro' from `arch/powerpc/kernel/head_44x.o' being placed in section `.data.rel.ro'. Link: https://lists.01.org/pipermail/kbuild-all/2017-November/040010.html Reported-by: kbuild test robotSigned-off-by: Nicholas Piggin --- arch/powerpc/kernel/vmlinux.lds.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index 0494e1566ee2..51e4ec92ade1 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -264,6 +264,7 @@ SECTIONS #ifdef CONFIG_PPC32 .data : AT(ADDR(.data) - LOAD_OFFSET) { DATA_DATA + *(.data.rel*) *(.sdata) *(.sdata2) *(.got.plt) *(.got) -- 2.15.0
Re: linux-next: Tree for Nov 7
On Mon 13-11-17 22:34:50, Michael Ellerman wrote: > Hi Michal, > > Michal Hockowrites: > > On Mon 13-11-17 10:20:06, Michal Hocko wrote: > >> [Cc arm and ppc maintainers] > > > > Hmm, it turned out to be a problem on other architectures as well. > > CCing more maintainers. For your reference, we are talking about > > http://lkml.kernel.org/r/20171023082608.6167-1-mho...@kernel.org > > which has broken architectures which do apply aligning on the mmap > > address hint without MAP_FIXED applied. See below my proposed way > > around this issue because I belive that the above patch is quite > > valuable on its own to be dropped for all archs. > > I don't really like your solution sorry :) The fact that you've had to > patch seven arches seems like a red flag. > > I think this is a generic problem with MAP_FIXED, which I've heard > userspace folks complain about in the past. The thing is that we canno change MAP_FIXED behavior as it is carved in stone > Currently MAP_FIXED does two things: > 1. makes addr not a hint but the required address > 2. blasts any existing mapping > > You want 1) but not 2). + fail if there is a clashing range > So the right solution IMHO would be to add a new mmap flag to request > that behaviour, ie. a fixed address but iff there is nothing already > mapped there. > > I don't know the mm code well enough to know if that's hard for some > reason, but it *seems* like it should be doable. Yes, I have mentioned that in the previous email but the amount of code would be even larger. Basically every arch which reimplements arch_get_unmapped_area would have to special case new MAP_FIXED flag to do vma lookup. So this was the most simple solution I could come up with. If there was a general interest for MAP_FIXED_SAFE then we can introduce it later of course. I would just like the hardening merged sooner rather than later. -- Michal Hocko SUSE Labs
Re: [PATCH v3] kernel/module_64.c: Add REL24 relocation support of livepatch symbols
On Monday 13 November 2017 02:08 PM, Balbir Singh wrote: On Fri, Nov 10, 2017 at 2:28 PM, Josh Poimboeufwrote: On Fri, Nov 10, 2017 at 01:06:25PM +1100, Balbir Singh wrote: On Fri, Nov 10, 2017 at 2:19 AM, Josh Poimboeuf wrote: FWIW, I think it won't matter anyway. I'm currently pursuing the option of inserting nops after local calls, because it has less runtime complexity than using a stub. I think I've figured out a way to do it with a GCC plugin, but if that doesn't work I'll try the asm listing sed approach suggested by Michael. A plugin that runs for the new kernel with the patch? Just for specific files involved in the patch? The plugin will affect the code generation of all functions in the patch module. So all calls in all replacement functions will have the nops. Here's a prototype (not yet fully tested): https://github.com/jpoimboe/kpatch/blob/TODO-ppc-fix/kpatch-build/gcc-plugins/ppc64le-plugin.c [...] I'd have to look closer, but I wonder what happens if a function has no global entry point (See http://mpe.github.io/posts/2016/05/23/kernel-live-patching-for-ppc64le/) section Understanding the TOC & entry points and the example of int_to_scsilun() int_to_scsilun() is accessed via global entry point, but the TOC setup part in the function prologue is skipped as the function doesn't have a need to save/restore TOC. The reason begin, it doesn't calls any global functions, thus the r2 register is not clobbered. As per my understanding, the plug-in inserts a nop instruction after branch to a function called via localentry a.k.a call to local function. In case of int_to_scsilun(), the plug-in will not modify the function in any way, as there are no local calls also made by the int_to_scsilun(). -- cheers, Kamalesh.
Re: linux-next: Tree for Nov 7
Hi Michal, Michal Hockowrites: > On Mon 13-11-17 10:20:06, Michal Hocko wrote: >> [Cc arm and ppc maintainers] > > Hmm, it turned out to be a problem on other architectures as well. > CCing more maintainers. For your reference, we are talking about > http://lkml.kernel.org/r/20171023082608.6167-1-mho...@kernel.org > which has broken architectures which do apply aligning on the mmap > address hint without MAP_FIXED applied. See below my proposed way > around this issue because I belive that the above patch is quite > valuable on its own to be dropped for all archs. I don't really like your solution sorry :) The fact that you've had to patch seven arches seems like a red flag. I think this is a generic problem with MAP_FIXED, which I've heard userspace folks complain about in the past. Currently MAP_FIXED does two things: 1. makes addr not a hint but the required address 2. blasts any existing mapping You want 1) but not 2). So the right solution IMHO would be to add a new mmap flag to request that behaviour, ie. a fixed address but iff there is nothing already mapped there. I don't know the mm code well enough to know if that's hard for some reason, but it *seems* like it should be doable. cheers
Re: [PATCH] bootwrapper: mspsc.c: fix pointer-to-int-cast warnings
Hi Michal, Thanks for trying to fix this. Michal Suchanekwrites: > I get these warnings: > > ../arch/powerpc/boot/mpsc.c: In function 'mpsc_get_virtreg_of_phandle': > ../arch/powerpc/boot/mpsc.c:113:35: warning: cast from pointer to > integer of different size [-Wpointer-to-int-cast] > > ../arch/powerpc/boot/mpsc.c: In function 'mpsc_console_init': > ../arch/powerpc/boot/mpsc.c:147:12: warning: cast from pointer to > integer of different size [-Wpointer-to-int-cast] > > Presumably the patch below fixes these, and presumably the DT defines > that pointes and integers have the same size in the DT so this is fine Actually no it doesn't. A device tree doesn't contain pointers, and it doesn't say anything about the size of a pointer on a platform that uses that device tree. phandles are not pointers, they are handles, and they are always 32-bit. So this code is just wrong on 64-bit, you can't safely cast from a 64-bit pointer to a phandle. But I think this warning has actually been fixed differently by commit: 866bfc75f40e ("powerpc: conditionally compile platform-specific serial drivers") If you're still seeing the warning let me know. cheers
Re: linux-next: Tree for Nov 7
On Mon 13-11-17 10:20:06, Michal Hocko wrote: > [Cc arm and ppc maintainers] Hmm, it turned out to be a problem on other architectures as well. CCing more maintainers. For your reference, we are talking about http://lkml.kernel.org/r/20171023082608.6167-1-mho...@kernel.org which has broken architectures which do apply aligning on the mmap address hint without MAP_FIXED applied. See below my proposed way around this issue because I belive that the above patch is quite valuable on its own to be dropped for all archs. > Thanks a lot for testing! > > On Sun 12-11-17 11:38:02, Joel Stanley wrote: > > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hockowrote: > > > Hi Joel, > > > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > > [...] > > >> > There are a lot of messages on the way up that look like this: > > >> > > > >> > [2.527460] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > [2.540160] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > [2.546153] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > > > >> > And then trying to run userspace looks like this: > > >> > > >> Could you please run with debugging patch posted > > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63k...@dhcp22.suse.cz > > > > > > Did you have chance to test with this debugging patch, please? > > > > Lots of this: > > > > [1.177266] Uhuuh, elf segement at 000d9000 requested but the memory is > > mapped already, got 000dd000 > > [1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) > > This smells like the problem I've expected that mmap with hint doesn't > respect the hint even though there is no clashing mapping. The above > basically says that we didn't map at 0xd9000 but it has placed it at > 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new > mapping. find_vma returns the closest vma (with addr < vm_end) for the > given address 0xd9000 so this address cannot be mapped by any other vma. > > Now that I am looking at arm's arch_get_unmapped_area it does perform > aligning for shared vmas. We do not do that for MAP_FIXED. Powepc, > reported earlier [1] seems to suffer from the similar problem. > slice_get_unmapped_area alignes to slices, whatever that means. > > I can see two possible ways around that. Either we explicitly request > non-aligned mappings via a special MAP_$FOO (e.g. MAP_FIXED_SAFE) or > simply opt out from the MAP_FIXED protection via ifdefs. The first > option sounds more generic to me but also more tricky to not introduce > other user visible effects. The later is quite straightforward. What do > you think about the following on top of the previous patch? > > It is rather terse and disables the MAP_FIXED protection for arm > comletely because I couldn't find a way to make it conditional on > CACHEID_VIPT_ALIASING. But this can be always handled later. I find the > protection for other archtectures useful enough to have this working for > most architectures now and handle others specially. > > [1] http://lkml.kernel.org/r/1510048229.12079.7.ca...@abdul.in.ibm.com > --- diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 61a0cb15067e..018d041a30e6 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -99,6 +99,7 @@ config ARM select PERF_USE_VMALLOC select RTC_LIB select SYS_SUPPORTS_APM_EMULATION + select ARCH_ALIGNED_MMAPS # Above selects are sorted alphabetically; please add new ones # according to that. Thanks. help diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 48d91d5be4e9..eca59d27e9f1 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -72,6 +72,7 @@ config MIPS select RTC_LIB if !MACH_LOONGSON64 select SYSCTL_EXCEPTION_TRACE select VIRT_TO_BUS + select ARCH_ALIGNED_MMAPS menu "Machine selection" diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index 22f27ec8c117..8376d16e0a4a 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -40,6 +40,7 @@ config PARISC select GENERIC_CLOCKEVENTS select ARCH_NO_COHERENT_DMA_MMAP select CPU_NO_EFFICIENT_FFS + select ARCH_ALIGNED_MMAPS help The PA-RISC microprocessor is designed by Hewlett-Packard and used diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 2f629e0551e9..156f69c09c7f 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -368,6 +368,7 @@ config PPC_MM_SLICES bool default y if PPC_STD_MMU_64 default n + select ARCH_ALIGNED_MMAPS config PPC_HAVE_PMU_SUPPORT bool diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig index 640a85925060..ac1d4637a728 100644 --- a/arch/sh/Kconfig +++ b/arch/sh/Kconfig @@ -49,6 +49,7 @@ config SUPERH select
Re: [PATCHv4 3/6] powerpc64: Add .opd based function descriptor dereference
On (11/13/17 12:41), Santosh Sivaraj wrote: > * Sergey Senozhatskywrote (on 2017-11-10 > 08:48:27 +0900): > > > We are moving towards separate kernel and module function descriptor > > dereference callbacks. This patch enables it for powerpc64. > > > > For pointers that belong to the kernel > > - Added __start_opd and __end_opd pointers, to track the kernel > >.opd section address range; > > > > - Added dereference_kernel_function_descriptor(). Now we > >will dereference only function pointers that are within > >[__start_opd, __end_opd); > > > > For pointers that belong to a module > > - Added dereference_module_function_descriptor() to handle module > >function descriptor dereference. Now we will dereference only > >pointers that are within [module->opd.start, module->opd.end). > > > > Signed-off-by: Sergey Senozhatsky > > --- > > arch/powerpc/include/asm/module.h | 3 +++ > > arch/powerpc/include/asm/sections.h | 12 > > arch/powerpc/kernel/module_64.c | 14 ++ > > arch/powerpc/kernel/vmlinux.lds.S | 2 ++ > > 4 files changed, 31 insertions(+) > > > > Looks good on powerpc. If you wish: > > Tested-by: Santosh Sivaraj # for powerpc thanks! -ss
Re: linux-next: Tree for Nov 7
On Mon, Nov 13, 2017 at 10:20:06AM +0100, Michal Hocko wrote: > [Cc arm and ppc maintainers] > > Thanks a lot for testing! > > On Sun 12-11-17 11:38:02, Joel Stanley wrote: > > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hockowrote: > > > Hi Joel, > > > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > > [...] > > >> > There are a lot of messages on the way up that look like this: > > >> > > > >> > [2.527460] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > [2.540160] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > [2.546153] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > > > >> > And then trying to run userspace looks like this: > > >> > > >> Could you please run with debugging patch posted > > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63k...@dhcp22.suse.cz > > > > > > Did you have chance to test with this debugging patch, please? > > > > Lots of this: > > > > [1.177266] Uhuuh, elf segement at 000d9000 requested but the memory is > > mapped already, got 000dd000 > > [1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) > > This smells like the problem I've expected that mmap with hint doesn't > respect the hint even though there is no clashing mapping. The above > basically says that we didn't map at 0xd9000 but it has placed it at > 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new > mapping. find_vma returns the closest vma (with addr < vm_end) for the > given address 0xd9000 so this address cannot be mapped by any other vma. > > Now that I am looking at arm's arch_get_unmapped_area it does perform > aligning for shared vmas. We do not do that for MAP_FIXED. Powepc, > reported earlier [1] seems to suffer from the similar problem. > slice_get_unmapped_area alignes to slices, whatever that means. > > I can see two possible ways around that. Either we explicitly request > non-aligned mappings via a special MAP_$FOO (e.g. MAP_FIXED_SAFE) or > simply opt out from the MAP_FIXED protection via ifdefs. The first > option sounds more generic to me but also more tricky to not introduce > other user visible effects. The later is quite straightforward. What do > you think about the following on top of the previous patch? > > It is rather terse and disables the MAP_FIXED protection for arm > comletely because I couldn't find a way to make it conditional on > CACHEID_VIPT_ALIASING. But this can be always handled later. I find the > protection for other archtectures useful enough to have this working for > most architectures now and handle others specially. Can someone provide the background information for this please? -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: linux-next: Tree for Nov 7
[Cc arm and ppc maintainers] Thanks a lot for testing! On Sun 12-11-17 11:38:02, Joel Stanley wrote: > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hockowrote: > > Hi Joel, > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > [...] > >> > There are a lot of messages on the way up that look like this: > >> > > >> > [2.527460] Uhuuh, elf segement at 000d9000 requested but the > >> > memory is mapped already > >> > [2.540160] Uhuuh, elf segement at 000d9000 requested but the > >> > memory is mapped already > >> > [2.546153] Uhuuh, elf segement at 000d9000 requested but the > >> > memory is mapped already > >> > > >> > And then trying to run userspace looks like this: > >> > >> Could you please run with debugging patch posted > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63k...@dhcp22.suse.cz > > > > Did you have chance to test with this debugging patch, please? > > Lots of this: > > [1.177266] Uhuuh, elf segement at 000d9000 requested but the memory is > mapped already, got 000dd000 > [1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) This smells like the problem I've expected that mmap with hint doesn't respect the hint even though there is no clashing mapping. The above basically says that we didn't map at 0xd9000 but it has placed it at 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new mapping. find_vma returns the closest vma (with addr < vm_end) for the given address 0xd9000 so this address cannot be mapped by any other vma. Now that I am looking at arm's arch_get_unmapped_area it does perform aligning for shared vmas. We do not do that for MAP_FIXED. Powepc, reported earlier [1] seems to suffer from the similar problem. slice_get_unmapped_area alignes to slices, whatever that means. I can see two possible ways around that. Either we explicitly request non-aligned mappings via a special MAP_$FOO (e.g. MAP_FIXED_SAFE) or simply opt out from the MAP_FIXED protection via ifdefs. The first option sounds more generic to me but also more tricky to not introduce other user visible effects. The later is quite straightforward. What do you think about the following on top of the previous patch? It is rather terse and disables the MAP_FIXED protection for arm comletely because I couldn't find a way to make it conditional on CACHEID_VIPT_ALIASING. But this can be always handled later. I find the protection for other archtectures useful enough to have this working for most architectures now and handle others specially. [1] http://lkml.kernel.org/r/1510048229.12079.7.ca...@abdul.in.ibm.com --- diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 61a0cb15067e..018d041a30e6 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -99,6 +99,7 @@ config ARM select PERF_USE_VMALLOC select RTC_LIB select SYS_SUPPORTS_APM_EMULATION + select ARCH_ALIGNED_MMAPS # Above selects are sorted alphabetically; please add new ones # according to that. Thanks. help diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 2f629e0551e9..156f69c09c7f 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -368,6 +368,7 @@ config PPC_MM_SLICES bool default y if PPC_STD_MMU_64 default n + select ARCH_ALIGNED_MMAPS config PPC_HAVE_PMU_SUPPORT bool diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index a22718de42db..d23eb89f31c0 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -345,13 +345,19 @@ static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr, unsigned long size, int prot, int type, unsigned long off) { unsigned long map_addr; + unsigned long map_type = type; /* * If caller requests the mapping at a specific place, make sure we fail * rather than potentially clobber an existing mapping which can have -* security consequences (e.g. smash over the stack area). +* security consequences (e.g. smash over the stack area). Be careful +* about architectures which do not respect the address hint due to +* aligning restrictions for !fixed mappings. */ - map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off); + if (!IS_ENABLED(ARCH_ALIGNED_MMAPS)) + map_type &= ~MAP_FIXED; + + map_addr = vm_mmap(filep, addr, size, prot, map_type, off); if (BAD_ADDR(map_addr)) return map_addr; -- Michal Hocko SUSE Labs
Re: [PATCH v3] kernel/module_64.c: Add REL24 relocation support of livepatch symbols
On Fri, Nov 10, 2017 at 2:28 PM, Josh Poimboeufwrote: > On Fri, Nov 10, 2017 at 01:06:25PM +1100, Balbir Singh wrote: >> On Fri, Nov 10, 2017 at 2:19 AM, Josh Poimboeuf wrote: >> > FWIW, I think it won't matter anyway. I'm currently pursuing the option >> > of inserting nops after local calls, because it has less runtime >> > complexity than using a stub. >> > >> > I think I've figured out a way to do it with a GCC plugin, but if that >> > doesn't work I'll try the asm listing sed approach suggested by Michael. >> > >> >> A plugin that runs for the new kernel with the patch? Just for >> specific files involved in the patch? > > The plugin will affect the code generation of all functions in the patch > module. So all calls in all replacement functions will have the nops. > > Here's a prototype (not yet fully tested): > > > https://github.com/jpoimboe/kpatch/blob/TODO-ppc-fix/kpatch-build/gcc-plugins/ppc64le-plugin.c > Thanks! I looked at it briefly, I need to catch up with gcc plugins, code < 1000 was interesting. I guess now we don't need to pass via the new kpatch helper stub, we would just replace the no-ops and be done! I'd have to look closer, but I wonder what happens if a function has no global entry point (See http://mpe.github.io/posts/2016/05/23/kernel-live-patching-for-ppc64le/) section Understanding the TOC & entry points and the example of int_to_scsilun() Balbir Singh.
[PATCH] powerpc/npu: Cleanup MMIO ATSD flushing
While reviewing the code I found that the flush assumes all pages are of mmu_linear_psize, which is not correct. The patch uses find_linux_pte to find the right page size and uses that for launching the ATSD invalidation. A new helper is added to abstract the invalidation from the various notifiers. The patch also cleans up a bit by removing AP size from PID flushes. Signed-off-by: Balbir Singh--- Changelog since RFC - Implement review comments from Aneesh arch/powerpc/platforms/powernv/npu-dma.c | 66 1 file changed, 49 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c index 2cb6cbea4b3b..bf12a023a8de 100644 --- a/arch/powerpc/platforms/powernv/npu-dma.c +++ b/arch/powerpc/platforms/powernv/npu-dma.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -27,6 +28,7 @@ #include #include #include +#include #include "powernv.h" #include "pci.h" @@ -459,9 +461,6 @@ static int mmio_invalidate_pid(struct npu *npu, unsigned long pid, bool flush) /* PRS set to process-scoped */ launch |= PPC_BIT(13); - /* AP */ - launch |= (u64) mmu_get_ap(mmu_virtual_psize) << PPC_BITLSHIFT(17); - /* PID */ launch |= pid << PPC_BITLSHIFT(38); @@ -473,7 +472,8 @@ static int mmio_invalidate_pid(struct npu *npu, unsigned long pid, bool flush) } static int mmio_invalidate_va(struct npu *npu, unsigned long va, - unsigned long pid, bool flush) + unsigned long pid, bool flush, + unsigned int shift) { unsigned long launch; @@ -484,9 +484,8 @@ static int mmio_invalidate_va(struct npu *npu, unsigned long va, launch |= PPC_BIT(13); /* AP */ - launch |= (u64) mmu_get_ap(mmu_virtual_psize) << PPC_BITLSHIFT(17); + launch |= (u64) mmu_get_ap(shift) << PPC_BITLSHIFT(17); - /* PID */ launch |= pid << PPC_BITLSHIFT(38); /* No flush */ @@ -503,7 +502,8 @@ struct mmio_atsd_reg { }; static void mmio_invalidate_wait( - struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS], bool flush) + struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS], bool flush, + unsigned int shift) { struct npu *npu; int i, reg; @@ -536,7 +536,8 @@ static void mmio_invalidate_wait( * the value of va. */ static void mmio_invalidate(struct npu_context *npu_context, int va, - unsigned long address, bool flush) + unsigned long address, bool flush, + unsigned int shift) { int i, j; struct npu *npu; @@ -569,7 +570,7 @@ static void mmio_invalidate(struct npu_context *npu_context, int va, if (va) mmio_atsd_reg[i].reg = mmio_invalidate_va(npu, address, pid, - flush); + flush, shift); else mmio_atsd_reg[i].reg = mmio_invalidate_pid(npu, pid, flush); @@ -582,10 +583,43 @@ static void mmio_invalidate(struct npu_context *npu_context, int va, } } - mmio_invalidate_wait(mmio_atsd_reg, flush); + mmio_invalidate_wait(mmio_atsd_reg, flush, shift); if (flush) /* Wait for the flush to complete */ - mmio_invalidate_wait(mmio_atsd_reg, false); + mmio_invalidate_wait(mmio_atsd_reg, false, shift); +} + +static void pnv_npu2_invalidate_helper(struct npu_context *npu_context, + struct mm_struct *mm, unsigned long start, + unsigned long end, bool flush) +{ + unsigned long address; + bool is_thp = false; + unsigned int hshift = 0, shift; + + address = start; + do { + local_irq_disable(); + find_linux_pte(mm->pgd, address, _thp, ); + if (!is_thp) + shift = PAGE_SHIFT; + else if (hshift && !is_thp) + shift = hshift; +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + else + shift = HPAGE_PMD_SIZE; +#else + else { + shift = PAGE_SHIFT; + pr_warn_once("unsupport page size for mm %p,addr %lx\n", + mm, start); + } +#endif + mmio_invalidate(npu_context, address > 0, address, flush, + shift); + local_irq_enable(); + address += (1ull << shift); + } while (address < end); } static void pnv_npu2_mn_release(struct mmu_notifier *mn, @@ -601,7 +635,7 @@ static void