scheduler crash on Power
I am getting this crash on a Powerpc system using 3.16.0-rc7 kernel plus some patches related to perf (24x7 counters) that Cody Schafer posted here: https://lkml.org/lkml/2014/5/27/768 I don't get the crash on an unpatched kernel though. I have been staring at the perf event patches, but can't find anything impacting the scheduler. Besides the patches had worked on 3.16.0-rc2 kernel on a different Power system. The crash occurs on an idle system, a minute or two after booting to runlevel 3. kernel/sched/core.c: --- 5877 static void init_sched_groups_capacity(int cpu, struct sched_domain *sd) 5878 { 5879 struct sched_group *sg = sd->groups; 5880 5881 WARN_ON(!sg); 5882 5883 do { 5884 sg->group_weight = cpumask_weight(sched_group_cpus(sg)); --- I tried applying the patch discussed in https://lkml.org/lkml/2014/7/16/386 but doesn't seem to help. diff --git a/kernel/sched/core.c b/kernel/sched/core.c index bc1638b..50702a8 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5842,6 +5842,8 @@ build_sched_groups(struct sched_domain *sd, int cpu) continue; group = get_group(i, sdd, &sg); + cpumask_clear(sched_group_cpus(sg)); + sg->sgc->capacity = 0; cpumask_setall(sched_group_mask(sg)); for_each_cpu(j, span) { I am also attaching the debug messages that Peterz added here: https://lkml.org/lkml/2014/7/17/288 Appreciate any debug suggestions. Sukadev Red Hat Enterprise Linux Server 7.0 (Maipo) Kernel 3.16.0-rc7-24x7+ on an ppc64 ltcbrazos2-lp07 login: Red Hat Enterprise Linux Server 7.0 (Maipo) Kernel 3.16.0-rc7-24x7+ on an ppc64 ltcbrazos2-lp07 login: [ 181.915974] [ cut here ] [ 181.915991] WARNING: at ../kernel/sched/core.c:5881 [ 181.915994] Modules linked in: sg cfg80211 rfkill nx_crypto ibmveth pseries_rng xfs libcrc32c sd_mod crc_t10dif crct10dif_common ibmvscsi scsi_transport_srp scsi_tgt dm_mirror dm_region_hash dm_log dm_mod [ 181.916024] CPU: 4 PID: 1087 Comm: kworker/4:2 Not tainted 3.16.0-rc7-24x7+ #15 [ 181.916034] Workqueue: events .topology_work_fn [ 181.916038] task: c000dbd4 ti: c000da40 task.ti: c000da40 [ 181.916043] NIP: c00d7528 LR: c00d7578 CTR: [ 181.916047] REGS: c000da403580 TRAP: 0700 Not tainted (3.16.0-rc7-24x7+) [ 181.916051] MSR: 800100029032 CR: 28484c24 XER: [ 181.916063] CFAR: c00d74f4 SOFTE: 1 GPR00: c00d7578 c000da403800 c0eaa7f0 0800 GPR04: 0800 0800 c09cf878 GPR08: c09cf880 0001 0010 GPR12: cebe1200 0800 c000cc2f GPR16: c0ef0a68 0078 c000e500 0078 GPR20: 0001 c000cc2f 0001 GPR24: c0db4402 000f c000dea39300 GPR28: c0ef0ae0 c000e544 c0ef4f7c [ 181.916146] NIP [c00d7528] .build_sched_domains+0xc28/0xd90 [ 181.916151] LR [c00d7578] .build_sched_domains+0xc78/0xd90 [ 181.916155] Call Trace: [ 181.916159] [c000da403800] [c00d7578] .build_sched_domains+0xc78/0xd90 (unreliable) [ 181.916166] [c000da403950] [c00d7950] .partition_sched_domains+0x260/0x3f0 [ 181.916175] [c000da403a30] [c0141704] .rebuild_sched_domains_locked+0x54/0x70 [ 181.916182] [c000da403ab0] [c0143a98] .rebuild_sched_domains+0x28/0x50 [ 181.916188] [c000da403b30] [c004f250] .topology_work_fn+0x10/0x30 [ 181.916194] [c000da403ba0] [c00b7100] .process_one_work+0x1a0/0x4c0 [ 181.916199] [c000da403c40] [c00b7970] .worker_thread+0x180/0x630 [ 181.916205] [c000da403d30] [c00bfc88] .kthread+0x108/0x130 [ 181.916214] [c000da403e30] [c000a3e4] .ret_from_kernel_thread+0x58/0x74 [ 181.916220] Instruction dump: [ 181.916223] 7f47492a e93c e90a0010 7d0a4378 7d4a482a 814a 2f8a 419e0008 [ 181.916235] 7f48492a ebdd0010 7fc90074 7929d182 <0b09> 4814 6000 6000 [ 181.916245] ---[ end trace 6e9d20016598c36c ]--- [ 181.916253] Unable to handle kernel paging request for data at address 0x0018 [ 181.916257] Faulting instruction address: 0xc039d1c0 [ 181.916263] Oops: Kernel access of bad area, sig: 11 [#1] [ 181.916267] SMP NR_CPUS=2048 NUMA pSeries [ 181.916271] Modules linked in: sg cfg80211 rfkill nx_crypto ibmveth pseries_rng xfs libcrc32c sd_mod crc_t10dif crct10dif_common ibmvscsi scsi_transport_srp scsi_tgt dm_mirror dm_region_hash dm_log dm_mod [ 181.916293] CPU: 4 PID: 1087 Comm: kworker/4:2 Tainted: GW 3.16.0-rc7-24x7+ #15 [ 181.916299] Workqueue: events .top
[PATCH v4 00/16] powernv: vfio: Add Dynamic DMA windows (DDW)
This prepares existing upstream kernel for DDW (Dynamic DMA windows) and adds actual DDW support for VFIO. This patchset does not contain any in-kernel acceleration stuff. This patchset does not enable DDW for emulated devices. Changes: v4: * addressed Ben's comments * big rework with moving tce_xxx callbacks out of ppc_md v3: * applied multiple comments from Gavin regarding error checking and callbacks placements v2: * moved "Account TCE pages in locked_vm" here (was in later series) * added counting for huge window to locked_vm (ugly but better than nothing) * fixed bug with missing >>PAGE_SHIFT when calling pfn_to_page Alexey Kardashevskiy (16): rcu: Define notrace version of list_for_each_entry_rcu and list_entry_rcu KVM: PPC: Use RCU for arch.spapr_tce_tables mm: Add helpers for locked_vm KVM: PPC: Account TCE-containing pages in locked_vm powerpc/iommu: Fix comments with it_page_shift powerpc/powernv: Make invalidate() a callback powerpc/spapr: vfio: Implement spapr_tce_iommu_ops powerpc/powernv: Convert/move set_bypass() callback to take_ownership() powerpc/iommu: Fix IOMMU ownership control functions powerpc: Move tce_xxx callbacks from ppc_md to iommu_table powerpc/powernv: Release replaced TCE powerpc/pseries/lpar: Enable VFIO powerpc/powernv: Implement Dynamic DMA windows (DDW) for IODA vfio: powerpc/spapr: Reuse locked_vm accounting helpers vfio: powerpc/spapr: Use it_page_size vfio: powerpc/spapr: Enable Dynamic DMA windows arch/powerpc/include/asm/iommu.h| 33 ++- arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/include/asm/machdep.h | 25 -- arch/powerpc/include/asm/tce.h | 38 +++ arch/powerpc/kernel/iommu.c | 158 - arch/powerpc/kernel/vio.c | 5 +- arch/powerpc/kvm/book3s.c | 2 +- arch/powerpc/kvm/book3s_64_vio.c| 43 +++- arch/powerpc/kvm/book3s_64_vio_hv.c | 6 +- arch/powerpc/platforms/cell/iommu.c | 9 +- arch/powerpc/platforms/pasemi/iommu.c | 8 +- arch/powerpc/platforms/powernv/pci-ioda.c | 239 --- arch/powerpc/platforms/powernv/pci-p5ioc2.c | 4 +- arch/powerpc/platforms/powernv/pci.c| 86 --- arch/powerpc/platforms/powernv/pci.h| 16 +- arch/powerpc/platforms/pseries/iommu.c | 77 -- arch/powerpc/sysdev/dart_iommu.c| 13 +- drivers/vfio/vfio_iommu_spapr_tce.c | 348 include/linux/mm.h | 3 + include/linux/rculist.h | 38 +++ include/uapi/linux/vfio.h | 37 ++- mm/mlock.c | 49 22 files changed, 990 insertions(+), 248 deletions(-) -- 2.0.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 02/16] KVM: PPC: Use RCU for arch.spapr_tce_tables
At the moment spapr_tce_tables is not protected against races. This makes use of RCU-variants of list helpers. As some bits are executed in real mode, this makes use of just introduced list_for_each_entry_rcu_notrace(). This converts release_spapr_tce_table() to a RCU scheduled handler. Signed-off-by: Alexey Kardashevskiy --- Changes: * total rework * kfree() for kvmppc_spapr_tce_table is moved to call_rcu_sched() callback * used new list_for_each_entry_rcu_notrace --- arch/powerpc/include/asm/kvm_host.h | 1 + arch/powerpc/kvm/book3s.c | 2 +- arch/powerpc/kvm/book3s_64_vio.c| 23 +-- arch/powerpc/kvm/book3s_64_vio_hv.c | 6 -- 4 files changed, 19 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index bb66d8b..cd22c31 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -180,6 +180,7 @@ struct kvmppc_spapr_tce_table { struct kvm *kvm; u64 liobn; u32 window_size; + struct rcu_head rcu; struct page *pages[0]; }; diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index c254c27..9e17d19 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -886,7 +886,7 @@ int kvmppc_core_init_vm(struct kvm *kvm) { #ifdef CONFIG_PPC64 - INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables); + INIT_LIST_HEAD_RCU(&kvm->arch.spapr_tce_tables); INIT_LIST_HEAD(&kvm->arch.rtas_tokens); #endif diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index 54cf9bc..5958f7d 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -45,19 +45,16 @@ static long kvmppc_stt_npages(unsigned long window_size) * sizeof(u64), PAGE_SIZE) / PAGE_SIZE; } -static void release_spapr_tce_table(struct kvmppc_spapr_tce_table *stt) +static void release_spapr_tce_table(struct rcu_head *head) { - struct kvm *kvm = stt->kvm; + struct kvmppc_spapr_tce_table *stt = container_of(head, + struct kvmppc_spapr_tce_table, rcu); int i; - mutex_lock(&kvm->lock); - list_del(&stt->list); for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++) __free_page(stt->pages[i]); + kvm_put_kvm(stt->kvm); kfree(stt); - mutex_unlock(&kvm->lock); - - kvm_put_kvm(kvm); } static int kvm_spapr_tce_fault(struct vm_area_struct *vma, struct vm_fault *vmf) @@ -87,8 +84,13 @@ static int kvm_spapr_tce_mmap(struct file *file, struct vm_area_struct *vma) static int kvm_spapr_tce_release(struct inode *inode, struct file *filp) { struct kvmppc_spapr_tce_table *stt = filp->private_data; + struct kvm *kvm = stt->kvm; + + mutex_lock(&kvm->lock); + list_del_rcu(&stt->list); + call_rcu_sched(&stt->rcu, release_spapr_tce_table); + mutex_unlock(&kvm->lock); - release_spapr_tce_table(stt); return 0; } @@ -106,7 +108,8 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, int i; /* Check this LIOBN hasn't been previously allocated */ - list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) { + list_for_each_entry_rcu_notrace(stt, &kvm->arch.spapr_tce_tables, + list) { if (stt->liobn == args->liobn) return -EBUSY; } @@ -131,7 +134,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, kvm_get_kvm(kvm); mutex_lock(&kvm->lock); - list_add(&stt->list, &kvm->arch.spapr_tce_tables); + list_add_rcu(&stt->list, &kvm->arch.spapr_tce_tables); mutex_unlock(&kvm->lock); diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c index 89e96b3..b1914d9 100644 --- a/arch/powerpc/kvm/book3s_64_vio_hv.c +++ b/arch/powerpc/kvm/book3s_64_vio_hv.c @@ -50,7 +50,8 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */ /* liobn, ioba, tce); */ - list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) { + list_for_each_entry_rcu_notrace(stt, &kvm->arch.spapr_tce_tables, + list) { if (stt->liobn == liobn) { unsigned long idx = ioba >> SPAPR_TCE_SHIFT; struct page *page; @@ -82,7 +83,8 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn, struct kvm *kvm = vcpu->kvm; struct kvmppc_spapr_tce_table *stt; - list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) { + list_for_each_entry_rcu_notrace(stt, &kvm->arch.spapr_tce_tables, + list) { if (stt->liobn == liobn) { unsigned long idx = ioba >> SPAPR_TCE_SHIFT; struc
[PATCH v4 04/16] KVM: PPC: Account TCE-containing pages in locked_vm
At the moment pages used for TCE tables (not pages addressed by TCEs) are not counter in locked_vm counter so a malicious userspace tool can call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and lock a lot of memory. This adds counting for pages used for TCE tables. This counts the number of pages required for a table plus pages for the kvmppc_spapr_tce_table struct (TCE table descriptor) itself. This does not change the amount of (de)allocated memory. Signed-off-by: Alexey Kardashevskiy --- Changes: v4: * fixed counting for kvmppc_spapr_tce_table (used to be +1 page) * added 2 helpers to common MM code for later reuse from vfio-spapr --- arch/powerpc/kvm/book3s_64_vio.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c index 5958f7d..b32aeb1 100644 --- a/arch/powerpc/kvm/book3s_64_vio.c +++ b/arch/powerpc/kvm/book3s_64_vio.c @@ -45,16 +45,33 @@ static long kvmppc_stt_npages(unsigned long window_size) * sizeof(u64), PAGE_SIZE) / PAGE_SIZE; } +static long kvmppc_account_memlimit(long npages, bool inc) +{ + long stt_pages = ALIGN(sizeof(struct kvmppc_spapr_tce_table) + + (abs(npages) * sizeof(struct page *)), PAGE_SIZE); + + npages += stt_pages; + if (inc) + return try_increment_locked_vm(npages); + + decrement_locked_vm(npages); + + return 0; +} + static void release_spapr_tce_table(struct rcu_head *head) { struct kvmppc_spapr_tce_table *stt = container_of(head, struct kvmppc_spapr_tce_table, rcu); int i; + long npages = kvmppc_stt_npages(stt->window_size); - for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++) + for (i = 0; i < npages; i++) __free_page(stt->pages[i]); kvm_put_kvm(stt->kvm); kfree(stt); + + kvmppc_account_memlimit(npages, false); } static int kvm_spapr_tce_fault(struct vm_area_struct *vma, struct vm_fault *vmf) @@ -115,6 +132,9 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, } npages = kvmppc_stt_npages(args->window_size); + ret = kvmppc_account_memlimit(npages, true); + if (ret) + goto fail; stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *), GFP_KERNEL); -- 2.0.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 03/16] mm: Add helpers for locked_vm
This adds 2 helpers to change the locked_vm counter: - try_increase_locked_vm - may fail if new locked_vm value will be greater than the RLIMIT_MEMLOCK limit; - decrease_locked_vm. These will be used by drivers capable of locking memory by userspace request. For example, VFIO can use it to check if it can lock DMA memory or PPC-KVM can use it to check if it can lock memory for TCE tables. Signed-off-by: Alexey Kardashevskiy --- include/linux/mm.h | 3 +++ mm/mlock.c | 49 + 2 files changed, 52 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index e03dd29..1cb219d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2113,5 +2113,8 @@ void __init setup_nr_node_ids(void); static inline void setup_nr_node_ids(void) {} #endif +extern long try_increment_locked_vm(long npages); +extern void decrement_locked_vm(long npages); + #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ diff --git a/mm/mlock.c b/mm/mlock.c index b1eb536..39e4b55 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -864,3 +864,52 @@ void user_shm_unlock(size_t size, struct user_struct *user) spin_unlock(&shmlock_user_lock); free_uid(user); } + +/** + * try_increment_locked_vm() - checks if new locked_vm value is going to + * be less than RLIMIT_MEMLOCK and increments it by npages if it is. + * + * @npages: the number of pages to add to locked_vm. + * + * Returns 0 if succeeded or negative value if failed. + */ +long try_increment_locked_vm(long npages) +{ + long ret = 0, locked, lock_limit; + + if (!current || !current->mm) + return -ESRCH; /* process exited */ + + down_write(¤t->mm->mmap_sem); + locked = current->mm->locked_vm + npages; + lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + if (locked > lock_limit && !capable(CAP_IPC_LOCK)) { + pr_warn("RLIMIT_MEMLOCK (%ld) exceeded\n", + rlimit(RLIMIT_MEMLOCK)); + ret = -ENOMEM; + } else { + current->mm->locked_vm += npages; + } + up_write(¤t->mm->mmap_sem); + + return ret; +} +EXPORT_SYMBOL_GPL(try_increment_locked_vm); + +/** + * decrement_locked_vm() - decrements the current task's locked_vm counter. + * + * @npages: the number to decrement by. + */ +void decrement_locked_vm(long npages) +{ + if (!current || !current->mm) + return; /* process exited */ + + down_write(¤t->mm->mmap_sem); + if (npages > current->mm->locked_vm) + npages = current->mm->locked_vm; + current->mm->locked_vm -= npages; + up_write(¤t->mm->mmap_sem); +} +EXPORT_SYMBOL_GPL(decrement_locked_vm); -- 2.0.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 05/16] powerpc/iommu: Fix comments with it_page_shift
There is a couple of commented debug prints which still use IOMMU_PAGE_SHIFT() which is not defined for POWERPC anymore, replace them with it_page_shift. Signed-off-by: Alexey Kardashevskiy Reviewed-by: Gavin Shan --- arch/powerpc/kernel/iommu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 88e3ec6..f84f799 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1037,7 +1037,7 @@ int iommu_tce_build(struct iommu_table *tbl, unsigned long entry, /* if (unlikely(ret)) pr_err("iommu_tce: %s failed on hwaddr=%lx ioba=%lx kva=%lx ret=%d\n", - __func__, hwaddr, entry << IOMMU_PAGE_SHIFT(tbl), + __func__, hwaddr, entry << tbl->it_page_shift, hwaddr, ret); */ return ret; @@ -1056,7 +1056,7 @@ int iommu_put_tce_user_mode(struct iommu_table *tbl, unsigned long entry, direction != DMA_TO_DEVICE, &page); if (unlikely(ret != 1)) { /* pr_err("iommu_tce: get_user_pages_fast failed tce=%lx ioba=%lx ret=%d\n", - tce, entry << IOMMU_PAGE_SHIFT(tbl), ret); */ + tce, entry << tbl->it_page_shift, ret); */ return -EFAULT; } hwaddr = (unsigned long) page_address(page) + offset; -- 2.0.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 08/16] powerpc/powernv: Convert/move set_bypass() callback to take_ownership()
At the moment the iommu_table struct has a set_bypass() which enables/ disables DMA bypass on IODA2 PHB. This is exposed to POWERPC IOMMU code which calls this callback when external IOMMU users such as VFIO are about to get over a PHB. Since the set_bypass() is not really an iommu_table function but PE's function, and we have an ops struct per IOMMU owner, let's move set_bypass() to the spapr_tce_iommu_ops struct. As arch/powerpc/kernel/iommu.c is more about POWERPC IOMMU tables and has very little to do with PEs, this moves take_ownership() calls to the VFIO SPAPR TCE driver. This renames set_bypass() to take_ownership() as it is not necessarily just enabling bypassing, it can be something else/more so let's give it a generic name. The bool parameter is inverted. Signed-off-by: Alexey Kardashevskiy Reviewed-by: Gavin Shan --- arch/powerpc/include/asm/iommu.h | 1 - arch/powerpc/include/asm/tce.h| 2 ++ arch/powerpc/kernel/iommu.c | 12 arch/powerpc/platforms/powernv/pci-ioda.c | 18 +++--- drivers/vfio/vfio_iommu_spapr_tce.c | 17 + 5 files changed, 30 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index 84ee339..2b0b01d 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -77,7 +77,6 @@ struct iommu_table { #ifdef CONFIG_IOMMU_API struct iommu_group *it_group; #endif - void (*set_bypass)(struct iommu_table *tbl, bool enable); }; /* Pure 2^n version of get_order */ diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h index 8bfe98f..5ee4987 100644 --- a/arch/powerpc/include/asm/tce.h +++ b/arch/powerpc/include/asm/tce.h @@ -58,6 +58,8 @@ struct spapr_tce_iommu_ops { struct iommu_table *(*get_table)( struct spapr_tce_iommu_group *data, phys_addr_t addr); + void (*take_ownership)(struct spapr_tce_iommu_group *data, + bool enable); }; struct spapr_tce_iommu_group { diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index e203314..06984d5 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1116,14 +1116,6 @@ int iommu_take_ownership(struct iommu_table *tbl) memset(tbl->it_map, 0xff, sz); iommu_clear_tces_and_put_pages(tbl, tbl->it_offset, tbl->it_size); - /* -* Disable iommu bypass, otherwise the user can DMA to all of -* our physical memory via the bypass window instead of just -* the pages that has been explicitly mapped into the iommu -*/ - if (tbl->set_bypass) - tbl->set_bypass(tbl, false); - return 0; } EXPORT_SYMBOL_GPL(iommu_take_ownership); @@ -1138,10 +1130,6 @@ void iommu_release_ownership(struct iommu_table *tbl) /* Restore bit#0 set by iommu_init_table() */ if (tbl->it_offset == 0) set_bit(0, tbl->it_map); - - /* The kernel owns the device now, we can restore the iommu bypass */ - if (tbl->set_bypass) - tbl->set_bypass(tbl, true); } EXPORT_SYMBOL_GPL(iommu_release_ownership); diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 495137b..f828c57 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -709,10 +709,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, __free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs)); } -static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable) +static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable) { - struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe, - tce32.table); uint16_t window_id = (pe->pe_number << 1 ) + 1; int64_t rc; @@ -752,15 +750,21 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct pnv_phb *phb, /* TVE #1 is selected by PCI address bit 59 */ pe->tce_bypass_base = 1ull << 59; - /* Install set_bypass callback for VFIO */ - pe->tce32.table.set_bypass = pnv_pci_ioda2_set_bypass; - /* Enable bypass by default */ - pnv_pci_ioda2_set_bypass(&pe->tce32.table, true); + pnv_pci_ioda2_set_bypass(pe, true); +} + +static void pnv_ioda2_take_ownership(struct spapr_tce_iommu_group *data, +bool enable) +{ + struct pnv_ioda_pe *pe = data->iommu_owner; + + pnv_pci_ioda2_set_bypass(pe, !enable); } static struct spapr_tce_iommu_ops pnv_pci_ioda2_ops = { .get_table = pnv_ioda1_iommu_get_table, + .take_ownership = pnv_ioda2_take_ownership, }; static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb, diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index
[PATCH v4 06/16] powerpc/powernv: Make invalidate() a callback
At the moment pnv_pci_ioda_tce_invalidate() gets the PE pointer via container_of(tbl). Since we are going to have to add Dynamic DMA windows and that means having 2 IOMMU tables per PE, this is not going to work. This implements pnv_pci_ioda(1|2)_tce_invalidate as a pnv_ioda_pe callback. This adds a pnv_iommu_table wrapper around iommu_table and stores a pointer to PE there. PNV's ppc_md.tce_build() call uses this to find PE and do the invalidation. This will be used later for Dynamic DMA windows too. This registers invalidate() callbacks for IODA1 and IODA2: - pnv_pci_ioda1_tce_invalidate; - pnv_pci_ioda2_tce_invalidate. Signed-off-by: Alexey Kardashevskiy --- Changes: v4: * changed commit log to explain why this change is needed --- arch/powerpc/platforms/powernv/pci-ioda.c | 33 +++ arch/powerpc/platforms/powernv/pci.c | 31 + arch/powerpc/platforms/powernv/pci.h | 13 +++- 3 files changed, 47 insertions(+), 30 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 9f28e18..007497f 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -462,7 +462,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev pe = &phb->ioda.pe_array[pdn->pe_number]; WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops); - set_iommu_table_base(&pdev->dev, &pe->tce32_table); + set_iommu_table_base(&pdev->dev, &pe->tce32.table); } static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb, @@ -489,7 +489,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb, } else { dev_info(&pdev->dev, "Using 32-bit DMA via iommu\n"); set_dma_ops(&pdev->dev, &dma_iommu_ops); - set_iommu_table_base(&pdev->dev, &pe->tce32_table); + set_iommu_table_base(&pdev->dev, &pe->tce32.table); } return 0; } @@ -499,7 +499,7 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus) struct pci_dev *dev; list_for_each_entry(dev, &bus->devices, bus_list) { - set_iommu_table_base_and_group(&dev->dev, &pe->tce32_table); + set_iommu_table_base_and_group(&dev->dev, &pe->tce32.table); if (dev->subordinate) pnv_ioda_setup_bus_dma(pe, dev->subordinate); } @@ -584,19 +584,6 @@ static void pnv_pci_ioda2_tce_invalidate(struct pnv_ioda_pe *pe, } } -void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl, -__be64 *startp, __be64 *endp, bool rm) -{ - struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe, - tce32_table); - struct pnv_phb *phb = pe->phb; - - if (phb->type == PNV_PHB_IODA1) - pnv_pci_ioda1_tce_invalidate(pe, tbl, startp, endp, rm); - else - pnv_pci_ioda2_tce_invalidate(pe, tbl, startp, endp, rm); -} - static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe, unsigned int base, unsigned int segs) @@ -654,9 +641,11 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, } /* Setup linux iommu table */ - tbl = &pe->tce32_table; + tbl = &pe->tce32.table; pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs, base << 28, IOMMU_PAGE_SHIFT_4K); + pe->tce32.pe = pe; + pe->tce_invalidate = pnv_pci_ioda1_tce_invalidate; /* OPAL variant of P7IOC SW invalidated TCEs */ swinvp = of_get_property(phb->hose->dn, "ibm,opal-tce-kill", NULL); @@ -693,7 +682,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb, static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable) { struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe, - tce32_table); + tce32.table); uint16_t window_id = (pe->pe_number << 1 ) + 1; int64_t rc; @@ -734,10 +723,10 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct pnv_phb *phb, pe->tce_bypass_base = 1ull << 59; /* Install set_bypass callback for VFIO */ - pe->tce32_table.set_bypass = pnv_pci_ioda2_set_bypass; + pe->tce32.table.set_bypass = pnv_pci_ioda2_set_bypass; /* Enable bypass by default */ - pnv_pci_ioda2_set_bypass(&pe->tce32_table, true); + pnv_pci_ioda2_set_bypass(&pe->tce32.table, true); } static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb, @@ -785,9 +774,11 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb, } /* Setup linux iommu table */ - tbl = &pe->tce32_table; + tbl = &pe-
[PATCH v4 09/16] powerpc/iommu: Fix IOMMU ownership control functions
This adds missing locks in iommu_take_ownership()/ iommu_release_ownership(). This marks all pages busy in iommu_table::it_map in order to catch errors if there is an attempt to use this table while ownership over it is taken. This only clears TCE content if there is no page marked busy in it_map. Clearing must be done outside of the table locks as iommu_clear_tce() called from iommu_clear_tces_and_put_pages() does this. Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/kernel/iommu.c | 36 +--- 1 file changed, 29 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 06984d5..c94b11d 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1103,33 +1103,55 @@ EXPORT_SYMBOL_GPL(iommu_put_tce_user_mode); int iommu_take_ownership(struct iommu_table *tbl) { - unsigned long sz = (tbl->it_size + 7) >> 3; + unsigned long flags, i, sz = (tbl->it_size + 7) >> 3; + int ret = 0, bit0 = 0; + + spin_lock_irqsave(&tbl->large_pool.lock, flags); + for (i = 0; i < tbl->nr_pools; i++) + spin_lock(&tbl->pools[i].lock); if (tbl->it_offset == 0) - clear_bit(0, tbl->it_map); + bit0 = test_and_clear_bit(0, tbl->it_map); if (!bitmap_empty(tbl->it_map, tbl->it_size)) { pr_err("iommu_tce: it_map is not empty"); - return -EBUSY; + ret = -EBUSY; + if (bit0) + set_bit(0, tbl->it_map); + } else { + memset(tbl->it_map, 0xff, sz); } - memset(tbl->it_map, 0xff, sz); - iommu_clear_tces_and_put_pages(tbl, tbl->it_offset, tbl->it_size); + for (i = 0; i < tbl->nr_pools; i++) + spin_unlock(&tbl->pools[i].lock); + spin_unlock_irqrestore(&tbl->large_pool.lock, flags); - return 0; + if (!ret) + iommu_clear_tces_and_put_pages(tbl, tbl->it_offset, + tbl->it_size); + return ret; } EXPORT_SYMBOL_GPL(iommu_take_ownership); void iommu_release_ownership(struct iommu_table *tbl) { - unsigned long sz = (tbl->it_size + 7) >> 3; + unsigned long flags, i, sz = (tbl->it_size + 7) >> 3; iommu_clear_tces_and_put_pages(tbl, tbl->it_offset, tbl->it_size); + + spin_lock_irqsave(&tbl->large_pool.lock, flags); + for (i = 0; i < tbl->nr_pools; i++) + spin_lock(&tbl->pools[i].lock); + memset(tbl->it_map, 0, sz); /* Restore bit#0 set by iommu_init_table() */ if (tbl->it_offset == 0) set_bit(0, tbl->it_map); + + for (i = 0; i < tbl->nr_pools; i++) + spin_unlock(&tbl->pools[i].lock); + spin_unlock_irqrestore(&tbl->large_pool.lock, flags); } EXPORT_SYMBOL_GPL(iommu_release_ownership); -- 2.0.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 12/16] powerpc/pseries/lpar: Enable VFIO
The previous patch introduced iommu_table_ops::set_and_get() callback which effectively disabled VFIO on pseries. This implements set_and_get() for pseries/lpar so VFIO can work under pHyp again. Since set_and_get() callback must return old TCE, it has to do H_GET_TCE for every TCE being replaced, therefore VFIO's performance under pHyp is expected to be slow. Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/platforms/pseries/iommu.c | 25 +++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 793f002..d3cded1 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -138,13 +138,14 @@ static void tce_freemulti_pSeriesLP(struct iommu_table*, long, long); static int tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum, long npages, unsigned long uaddr, + unsigned long *old_tces, enum dma_data_direction direction, struct dma_attrs *attrs) { u64 rc = 0; u64 proto_tce, tce; u64 rpn; - int ret = 0; + int ret = 0, i = 0; long tcenum_start = tcenum, npages_start = npages; rpn = __pa(uaddr) >> TCE_SHIFT; @@ -154,6 +155,9 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum, while (npages--) { tce = proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT; + if (old_tces) + plpar_tce_get((u64)tbl->it_index, (u64)tcenum << 12, + &old_tces[i++]); rc = plpar_tce_put((u64)tbl->it_index, (u64)tcenum << 12, tce); if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) { @@ -179,8 +183,9 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum, static DEFINE_PER_CPU(__be64 *, tce_page); -static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum, +static int tce_set_and_get_pSeriesLP(struct iommu_table *tbl, long tcenum, long npages, unsigned long uaddr, +unsigned long *old_tces, enum dma_data_direction direction, struct dma_attrs *attrs) { @@ -195,6 +200,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum, if ((npages == 1) || !firmware_has_feature(FW_FEATURE_MULTITCE)) { return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr, + old_tces, direction, attrs); } @@ -211,6 +217,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum, if (!tcep) { local_irq_restore(flags); return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr, + old_tces, direction, attrs); } __get_cpu_var(tce_page) = tcep; @@ -232,6 +239,10 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum, for (l = 0; l < limit; l++) { tcep[l] = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT); rpn++; + if (old_tces) + plpar_tce_get((u64)tbl->it_index, + (u64)(tcenum + l) << 12, + &old_tces[tcenum + l]); } rc = plpar_tce_put_indirect((u64)tbl->it_index, @@ -262,6 +273,15 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum, return ret; } +static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum, +long npages, unsigned long uaddr, +enum dma_data_direction direction, +struct dma_attrs *attrs) +{ + return tce_set_and_get_pSeriesLP(tbl, tcenum, npages, uaddr, NULL, + direction, attrs); +} + static void tce_free_pSeriesLP(struct iommu_table *tbl, long tcenum, long npages) { u64 rc; @@ -637,6 +657,7 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus) struct iommu_table_ops iommu_table_lpar_multi_ops = { .set = tce_buildmulti_pSeriesLP, + .set_and_get = tce_set_and_get_pSeriesLP, .clear = tce_freemulti_pSeriesLP, .get = tce_get_pSeriesLP }; -- 2.0.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 11/16] powerpc/powernv: Release replaced TCE
At the moment writing new TCE value to the IOMMU table fails with EBUSY if there is a valid entry already. However PAPR specification allows the guest to write new TCE value without clearing it first. This adds a set_and_get() callback to iommu_table_ops which does the same thing as set() plus it returns replaced TCE(s) so the caller can release the pages afterwards. This makes iommu_tce_build() put pages returned by set_and_get(). Since now we depend on permission bits in TCE entries, this preserves those bits in TCE in iommu_put_tce_user_mode(). This removes use of pool locks as those locks serve for TCE allocations rathen than IOMMU table access and new set_and_get() callback provides lockless way of safe pages release. This disables external IOMMU use (i.e. VFIO) for IOMMUs which do not implement set_and_get() callback. Therefore the "powernv" platform is the only supported one. Signed-off-by: Alexey Kardashevskiy --- Changes: v4: * this is merge+rework of powerpc/powernv: Return non-zero TCE from pnv_tce_build powerpc/iommu: Implement put_page() if TCE had non-zero value powerpc/iommu: Extend ppc_md.tce_build(_rm) to return old TCE values --- arch/powerpc/include/asm/iommu.h | 6 ++ arch/powerpc/kernel/iommu.c | 28 +++- arch/powerpc/platforms/powernv/pci.c | 29 +++-- 3 files changed, 44 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index c725e4a..4b13e4e 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -49,6 +49,12 @@ struct iommu_table_ops { unsigned long uaddr, enum dma_data_direction direction, struct dma_attrs *attrs); + int (*set_and_get)(struct iommu_table *tbl, + long index, long npages, + unsigned long uaddr, + unsigned long *old_tces, + enum dma_data_direction direction, + struct dma_attrs *attrs); void (*clear)(struct iommu_table *tbl, long index, long npages); unsigned long (*get)(struct iommu_table *tbl, long index); diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index 6a86788..ad52e00 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1007,9 +1007,6 @@ EXPORT_SYMBOL_GPL(iommu_tce_put_param_check); unsigned long iommu_clear_tce(struct iommu_table *tbl, unsigned long entry) { unsigned long oldtce; - struct iommu_pool *pool = get_pool(tbl, entry); - - spin_lock(&(pool->lock)); oldtce = tbl->it_ops->get(tbl, entry); if (oldtce & (TCE_PCI_WRITE | TCE_PCI_READ)) @@ -1017,8 +1014,6 @@ unsigned long iommu_clear_tce(struct iommu_table *tbl, unsigned long entry) else oldtce = 0; - spin_unlock(&(pool->lock)); - return oldtce; } EXPORT_SYMBOL_GPL(iommu_clear_tce); @@ -1056,16 +1051,12 @@ int iommu_tce_build(struct iommu_table *tbl, unsigned long entry, { int ret = -EBUSY; unsigned long oldtce; - struct iommu_pool *pool = get_pool(tbl, entry); - spin_lock(&(pool->lock)); + ret = tbl->it_ops->set_and_get(tbl, entry, 1, hwaddr, &oldtce, + direction, NULL); - oldtce = tbl->it_ops->get(tbl, entry); - /* Add new entry if it is not busy */ - if (!(oldtce & (TCE_PCI_WRITE | TCE_PCI_READ))) - ret = tbl->it_ops->set(tbl, entry, 1, hwaddr, direction, NULL); - - spin_unlock(&(pool->lock)); + if (oldtce & (TCE_PCI_WRITE | TCE_PCI_READ)) + put_page(pfn_to_page(__pa(oldtce) >> PAGE_SHIFT)); /* if (unlikely(ret)) pr_err("iommu_tce: %s failed on hwaddr=%lx ioba=%lx kva=%lx ret=%d\n", @@ -1092,6 +1083,7 @@ int iommu_put_tce_user_mode(struct iommu_table *tbl, unsigned long entry, return -EFAULT; } hwaddr = (unsigned long) page_address(page) + offset; + hwaddr |= tce & (TCE_PCI_READ | TCE_PCI_WRITE); ret = iommu_tce_build(tbl, entry, hwaddr, direction); if (ret) @@ -1110,6 +1102,16 @@ int iommu_take_ownership(struct iommu_table *tbl) unsigned long flags, i, sz = (tbl->it_size + 7) >> 3; int ret = 0, bit0 = 0; + /* +* VFIO does not control TCE entries allocation and the guest +* can write new TCEs on top of existing ones so iommu_tce_build() +* must be able to release old pages. This functionality +* requires set_and_get() callback defined so if it is not +* implemented, we disallow taking ownership over the table. +*/ + if (!tbl->it_ops->set_and_get) + return -EINVAL; + spin_lock_irqsave(&tbl->large_pool.lock, flags); for (i = 0; i < tbl->nr_pools; i++)
[PATCH v4 10/16] powerpc: Move tce_xxx callbacks from ppc_md to iommu_table
This adds a iommu_table_ops struct and puts pointer to it into the iommu_table struct. This moves tce_build/tce_free/tce_get/tce_flush callbacks from ppc_md to the new struct where they really belong to. This adds an extra @ops parameter to iommu_init_table() to make sure that we do not leave any IOMMU table without iommu_table_ops. @it_ops is initialized in the very beginning as iommu_init_table() calls iommu_table_clear() and the latter uses callbacks already. This does s/tce_build/set/, s/tce_free/clear/ and removes "tce_" prefixes for better readability. This removes tce_xxx_rm handlers from ppc_md as well but does not add them to iommu_table_ops, this will be done later if we decide to support TCE hypercalls in real mode. This always uses tce_buildmulti_pSeriesLP/tce_buildmulti_pSeriesLP as callbacks for pseries. This changes "multi" callbacks to fall back to tce_build_pSeriesLP/tce_free_pSeriesLP if FW_FEATURE_MULTITCE is not present. The reason for this is we still have to support "multitce=off" boot parameter in disable_multitce() and we do not want to walk through all IOMMU tables in the system and replace "multi" callbacks with single ones. Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/include/asm/iommu.h| 20 +++- arch/powerpc/include/asm/machdep.h | 25 --- arch/powerpc/kernel/iommu.c | 50 - arch/powerpc/kernel/vio.c | 5 ++- arch/powerpc/platforms/cell/iommu.c | 9 -- arch/powerpc/platforms/pasemi/iommu.c | 8 +++-- arch/powerpc/platforms/powernv/pci-ioda.c | 4 +-- arch/powerpc/platforms/powernv/pci-p5ioc2.c | 3 +- arch/powerpc/platforms/powernv/pci.c| 24 -- arch/powerpc/platforms/powernv/pci.h| 1 + arch/powerpc/platforms/pseries/iommu.c | 42 +--- arch/powerpc/sysdev/dart_iommu.c| 13 12 files changed, 102 insertions(+), 102 deletions(-) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index 2b0b01d..c725e4a 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -43,6 +43,22 @@ extern int iommu_is_off; extern int iommu_force_on; +struct iommu_table_ops { + int (*set)(struct iommu_table *tbl, + long index, long npages, + unsigned long uaddr, + enum dma_data_direction direction, + struct dma_attrs *attrs); + void (*clear)(struct iommu_table *tbl, + long index, long npages); + unsigned long (*get)(struct iommu_table *tbl, long index); + void (*flush)(struct iommu_table *tbl); +}; + +/* These are used by VIO */ +extern struct iommu_table_ops iommu_table_lpar_multi_ops; +extern struct iommu_table_ops iommu_table_pseries_ops; + /* * IOMAP_MAX_ORDER defines the largest contiguous block * of dma space we can get. IOMAP_MAX_ORDER = 13 @@ -77,6 +93,7 @@ struct iommu_table { #ifdef CONFIG_IOMMU_API struct iommu_group *it_group; #endif + struct iommu_table_ops *it_ops; }; /* Pure 2^n version of get_order */ @@ -106,7 +123,8 @@ extern void iommu_free_table(struct iommu_table *tbl, const char *node_name); * structure */ extern struct iommu_table *iommu_init_table(struct iommu_table * tbl, - int nid); + int nid, + struct iommu_table_ops *ops); struct spapr_tce_iommu_ops; #ifdef CONFIG_IOMMU_API diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index f92b0b5..0a2ec04 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -65,31 +65,6 @@ struct machdep_calls { * destroyed as well */ void(*hpte_clear_all)(void); - int (*tce_build)(struct iommu_table *tbl, -long index, -long npages, -unsigned long uaddr, -enum dma_data_direction direction, -struct dma_attrs *attrs); - void(*tce_free)(struct iommu_table *tbl, - long index, - long npages); - unsigned long (*tce_get)(struct iommu_table *tbl, - long index); - void(*tce_flush)(struct iommu_table *tbl); - - /* _rm versions are for real mode use only */ - int (*tce_build_rm)(struct iommu_table *tbl, -long index, -long npages, -unsigned long uaddr, -enum dma_data_direction direction, -
[PATCH v4 14/16] vfio: powerpc/spapr: Reuse locked_vm accounting helpers
There are helpers to account locked pages in locked_vm counter, this reuses these helpers in VFIO-SPAPR-IOMMU driver. While we are here, update the comment explaining why RLIMIT_MEMLOCK might be required to be bigger than entire guest RAM. Signed-off-by: Alexey Kardashevskiy --- Changes: v4: * added comment explaining how big the ulimit should be * used try_increment_locked_vm/decrement_locked_vm --- drivers/vfio/vfio_iommu_spapr_tce.c | 33 +++-- 1 file changed, 15 insertions(+), 18 deletions(-) diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index d9845af..6ed0fc3 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -58,7 +58,6 @@ static void tce_iommu_take_ownership_notify(struct spapr_tce_iommu_group *data, static int tce_iommu_enable(struct tce_container *container) { int ret = 0; - unsigned long locked, lock_limit, npages; struct iommu_table *tbl; struct spapr_tce_iommu_group *data; @@ -92,24 +91,24 @@ static int tce_iommu_enable(struct tce_container *container) * Also we don't have a nice way to fail on H_PUT_TCE due to ulimits, * that would effectively kill the guest at random points, much better * enforcing the limit based on the max that the guest can map. +* +* Unfortunately at the moment it counts whole tables, no matter how +* much memory the guest has. I.e. for 4GB guest and 4 IOMMU groups +* each with 2GB DMA window, 8GB will be counted here. The reason for +* this is that we cannot tell here the amount of RAM used by the guest +* as this information is only available from KVM and VFIO is +* KVM agnostic. */ tbl = data->ops->get_table(data, TCE_DEFAULT_WINDOW); if (!tbl) return -ENXIO; - down_write(¤t->mm->mmap_sem); - npages = (tbl->it_size << IOMMU_PAGE_SHIFT_4K) >> PAGE_SHIFT; - locked = current->mm->locked_vm + npages; - lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; - if (locked > lock_limit && !capable(CAP_IPC_LOCK)) { - pr_warn("RLIMIT_MEMLOCK (%ld) exceeded\n", - rlimit(RLIMIT_MEMLOCK)); - ret = -ENOMEM; - } else { - current->mm->locked_vm += npages; - container->enabled = true; - } - up_write(¤t->mm->mmap_sem); + ret = try_increment_locked_vm((tbl->it_size << IOMMU_PAGE_SHIFT_4K) >> + PAGE_SHIFT); + if (ret) + return ret; + + container->enabled = true; return ret; } @@ -135,10 +134,8 @@ static void tce_iommu_disable(struct tce_container *container) if (!tbl) return; - down_write(¤t->mm->mmap_sem); - current->mm->locked_vm -= (tbl->it_size << - IOMMU_PAGE_SHIFT_4K) >> PAGE_SHIFT; - up_write(¤t->mm->mmap_sem); + decrement_locked_vm((tbl->it_size << IOMMU_PAGE_SHIFT_4K) >> + PAGE_SHIFT); } static void *tce_iommu_open(unsigned long arg) -- 2.0.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 13/16] powerpc/powernv: Implement Dynamic DMA windows (DDW) for IODA
SPAPR defines an interface to create additional DMA windows dynamically. "Dynamically" means that the window is not allocated at the guest start and the guest can request it later. In practice, existing linux guests check for the capability and if it is there, they create+map one big DMA window as big as the entire guest RAM. SPAPR defines 4 RTAS calls for this feature which userspace implements. This adds 4 callbacks into the spapr_tce_iommu_ops struct: 1. query - ibm,query-pe-dma-window - returns number/size of windows which can be created (one, any page size); 2. create - ibm,create-pe-dma-window - creates a window; 3. remove - ibm,remove-pe-dma-window - removes a window; only additional window created by create() can be removed, the default 32bit window cannot be removed as guests do not expect new windows to start from zero; 4. reset - ibm,reset-pe-dma-window - reset the DMA windows configuration to the default state; now it only removes the additional window if it was created. The next patch will add corresponding ioctls to VFIO SPAPR TCE driver to pass RTAS call from the userspace to the IODA code. Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/include/asm/tce.h| 21 arch/powerpc/platforms/powernv/pci-ioda.c | 158 +- arch/powerpc/platforms/powernv/pci.h | 2 + 3 files changed, 180 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h index 5ee4987..583463b 100644 --- a/arch/powerpc/include/asm/tce.h +++ b/arch/powerpc/include/asm/tce.h @@ -60,6 +60,27 @@ struct spapr_tce_iommu_ops { phys_addr_t addr); void (*take_ownership)(struct spapr_tce_iommu_group *data, bool enable); + + /* Dynamic DMA window */ + /* Page size flags for ibm,query-pe-dma-window */ +#define DDW_PGSIZE_4K 0x01 +#define DDW_PGSIZE_64K 0x02 +#define DDW_PGSIZE_16M 0x04 +#define DDW_PGSIZE_32M 0x08 +#define DDW_PGSIZE_64M 0x10 +#define DDW_PGSIZE_128M 0x20 +#define DDW_PGSIZE_256M 0x40 +#define DDW_PGSIZE_16G 0x80 + long (*query)(struct spapr_tce_iommu_group *data, + __u32 *windows_available, + __u32 *page_size_mask); + long (*create)(struct spapr_tce_iommu_group *data, + __u32 page_shift, + __u32 window_shift, + struct iommu_table **ptbl); + long (*remove)(struct spapr_tce_iommu_group *data, + struct iommu_table *tbl); + long (*reset)(struct spapr_tce_iommu_group *data); }; struct spapr_tce_iommu_group { diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 7482518..6a847b2 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -754,6 +754,24 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct pnv_phb *phb, pnv_pci_ioda2_set_bypass(pe, true); } +static struct iommu_table *pnv_ioda2_iommu_get_table( + struct spapr_tce_iommu_group *data, + phys_addr_t addr) +{ + struct pnv_ioda_pe *pe = data->iommu_owner; + + if (addr == TCE_DEFAULT_WINDOW) + return &pe->tce32.table; + + if (pnv_pci_ioda_check_addr(&pe->tce64.table, addr)) + return &pe->tce64.table; + + if (pnv_pci_ioda_check_addr(&pe->tce32.table, addr)) + return &pe->tce32.table; + + return NULL; +} + static void pnv_ioda2_take_ownership(struct spapr_tce_iommu_group *data, bool enable) { @@ -762,9 +780,147 @@ static void pnv_ioda2_take_ownership(struct spapr_tce_iommu_group *data, pnv_pci_ioda2_set_bypass(pe, !enable); } +static long pnv_pci_ioda2_ddw_query(struct spapr_tce_iommu_group *data, + __u32 *windows_available, __u32 *page_size_mask) +{ + struct pnv_ioda_pe *pe = data->iommu_owner; + + if (pe->tce64_active) { + *page_size_mask = 0; + *windows_available = 0; + } else { + *page_size_mask = + DDW_PGSIZE_4K | + DDW_PGSIZE_64K | + DDW_PGSIZE_16M; + *windows_available = 1; + } + + return 0; +} + +static long pnv_pci_ioda2_ddw_create(struct spapr_tce_iommu_group *data, + __u32 page_shift, __u32 window_shift, + struct iommu_table **ptbl) +{ + struct pnv_ioda_pe *pe = data->iommu_owner; + struct pnv_phb *phb = pe->phb; + struct page *tce_mem = NULL; + void *addr; + long ret; + unsigned long tce_table_size = + (1ULL << (window_shift - page_shift)) * 8; + unsigned order; + struct iommu_table *tbl64 = &pe->tce64.table; + + if ((page_shift != 12) && (page_shift != 16) && (page_shif
[PATCH v4 15/16] vfio: powerpc/spapr: Use it_page_size
This makes use of the it_page_size from the iommu_table struct as page size can differ. This replaces missing IOMMU_PAGE_SHIFT macro in commented debug code as recently introduced IOMMU_PAGE_XXX macros do not include IOMMU_PAGE_SHIFT. Signed-off-by: Alexey Kardashevskiy --- drivers/vfio/vfio_iommu_spapr_tce.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 6ed0fc3..48b256c 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -103,7 +103,7 @@ static int tce_iommu_enable(struct tce_container *container) if (!tbl) return -ENXIO; - ret = try_increment_locked_vm((tbl->it_size << IOMMU_PAGE_SHIFT_4K) >> + ret = try_increment_locked_vm((tbl->it_size << tbl->it_page_shift) >> PAGE_SHIFT); if (ret) return ret; @@ -134,7 +134,7 @@ static void tce_iommu_disable(struct tce_container *container) if (!tbl) return; - decrement_locked_vm((tbl->it_size << IOMMU_PAGE_SHIFT_4K) >> + decrement_locked_vm((tbl->it_size << tbl->it_page_shift) >> PAGE_SHIFT); } @@ -207,8 +207,8 @@ static long tce_iommu_ioctl(void *iommu_data, if (info.argsz < minsz) return -EINVAL; - info.dma32_window_start = tbl->it_offset << IOMMU_PAGE_SHIFT_4K; - info.dma32_window_size = tbl->it_size << IOMMU_PAGE_SHIFT_4K; + info.dma32_window_start = tbl->it_offset << tbl->it_page_shift; + info.dma32_window_size = tbl->it_size << tbl->it_page_shift; info.flags = 0; if (copy_to_user((void __user *)arg, &info, minsz)) @@ -261,17 +261,17 @@ static long tce_iommu_ioctl(void *iommu_data, if (ret) return ret; - for (i = 0; i < (param.size >> IOMMU_PAGE_SHIFT_4K); ++i) { + for (i = 0; i < (param.size >> tbl->it_page_shift); ++i) { ret = iommu_put_tce_user_mode(tbl, - (param.iova >> IOMMU_PAGE_SHIFT_4K) + i, + (param.iova >> tbl->it_page_shift) + i, tce); if (ret) break; - tce += IOMMU_PAGE_SIZE_4K; + tce += IOMMU_PAGE_SIZE(tbl); } if (ret) iommu_clear_tces_and_put_pages(tbl, - param.iova >> IOMMU_PAGE_SHIFT_4K, i); + param.iova >> tbl->it_page_shift, i); iommu_flush_tce(tbl); @@ -312,13 +312,13 @@ static long tce_iommu_ioctl(void *iommu_data, BUG_ON(!tbl->it_group); ret = iommu_tce_clear_param_check(tbl, param.iova, 0, - param.size >> IOMMU_PAGE_SHIFT_4K); + param.size >> tbl->it_page_shift); if (ret) return ret; ret = iommu_clear_tces_and_put_pages(tbl, - param.iova >> IOMMU_PAGE_SHIFT_4K, - param.size >> IOMMU_PAGE_SHIFT_4K); + param.iova >> tbl->it_page_shift, + param.size >> tbl->it_page_shift); iommu_flush_tce(tbl); return ret; -- 2.0.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v4 16/16] vfio: powerpc/spapr: Enable Dynamic DMA windows
This defines and implements VFIO IOMMU API required to support Dynamic DMA windows defined in the SPAPR specification. The ioctl handlers implement host-size part of corresponding RTAS calls: - VFIO_IOMMU_SPAPR_TCE_QUERY - ibm,query-pe-dma-window; - VFIO_IOMMU_SPAPR_TCE_CREATE - ibm,create-pe-dma-window; - VFIO_IOMMU_SPAPR_TCE_REMOVE - ibm,remove-pe-dma-window; - VFIO_IOMMU_SPAPR_TCE_RESET - ibm,reset-pe-dma-window. The VFIO IOMMU driver does basic sanity checks and calls corresponding SPAPR TCE functions. At the moment only IODA2 (POWER8 PCI host bridge) implements them. This advertises VFIO_IOMMU_SPAPR_TCE_FLAG_DDW capability via VFIO_IOMMU_SPAPR_TCE_GET_INFO. This calls reset() when IOMMU is being disabled (happens when VFIO stops using it). Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/platforms/powernv/pci-ioda.c | 1 + drivers/vfio/vfio_iommu_spapr_tce.c | 173 +- include/uapi/linux/vfio.h | 37 ++- 3 files changed, 209 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index 6a847b2..f51afe2 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -859,6 +859,7 @@ static long pnv_pci_ioda2_ddw_create(struct spapr_tce_iommu_group *data, /* Copy "invalidate" register address */ tbl64->it_index = pe->tce32.table.it_index; + tbl64->it_group = pe->tce32.table.it_group; tbl64->it_type = TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE | TCE_PCI_SWINV_PAIR; tbl64->it_map = (void *) 0xDEADBEEF; /* poison */ diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 48b256c..32e2804 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -45,6 +45,7 @@ struct tce_container { struct mutex lock; struct iommu_group *grp; bool enabled; + unsigned long start64; }; @@ -123,19 +124,36 @@ static void tce_iommu_disable(struct tce_container *container) container->enabled = false; - if (!container->grp || !current->mm) + if (!container->grp) return; data = iommu_group_get_iommudata(container->grp); if (!data || !data->iommu_owner || !data->ops->get_table) return; + /* Try resetting, there might have been a 64bit window */ + if (data->ops->reset) + data->ops->reset(data); + + if (!current->mm) + return; + tbl = data->ops->get_table(data, TCE_DEFAULT_WINDOW); if (!tbl) return; decrement_locked_vm((tbl->it_size << tbl->it_page_shift) >> PAGE_SHIFT); + + if (!container->start64) + return; + + tbl = data->ops->get_table(data, container->start64); + if (!tbl) + return; + + decrement_locked_vm((tbl->it_size << tbl->it_page_shift) >> + PAGE_SHIFT); } static void *tce_iommu_open(unsigned long arg) @@ -210,6 +228,8 @@ static long tce_iommu_ioctl(void *iommu_data, info.dma32_window_start = tbl->it_offset << tbl->it_page_shift; info.dma32_window_size = tbl->it_size << tbl->it_page_shift; info.flags = 0; + if (data->ops->query && data->ops->create && data->ops->remove) + info.flags |= VFIO_IOMMU_SPAPR_TCE_FLAG_DDW; if (copy_to_user((void __user *)arg, &info, minsz)) return -EFAULT; @@ -335,6 +355,157 @@ static long tce_iommu_ioctl(void *iommu_data, tce_iommu_disable(container); mutex_unlock(&container->lock); return 0; + + case VFIO_IOMMU_SPAPR_TCE_QUERY: { + struct vfio_iommu_spapr_tce_query query; + struct spapr_tce_iommu_group *data; + + if (WARN_ON(!container->grp)) + return -ENXIO; + + data = iommu_group_get_iommudata(container->grp); + + minsz = offsetofend(struct vfio_iommu_spapr_tce_query, + page_size_mask); + + if (copy_from_user(&query, (void __user *)arg, minsz)) + return -EFAULT; + + if (query.argsz < minsz) + return -EINVAL; + + if (!data->ops->query || !data->iommu_owner) + return -ENOSYS; + + ret = data->ops->query(data, + &query.windows_available, + &query.page_size_mask); + + if (ret) + return ret; + + if (copy_to_user((void __user *)arg, &query, minsz)) + return -EFAULT; + + return 0; + } + case VFIO_IOMMU_SPAPR_TCE_CREATE:
[PATCH v4 07/16] powerpc/spapr: vfio: Implement spapr_tce_iommu_ops
Modern IBM POWERPC systems support multiple IOMMU tables per PE so we need a more reliable way (compared to container_of()) to get a PE pointer from the iommu_table struct pointer used in IOMMU functions. At the moment IOMMU group data points to an iommu_table struct. This introduces a spapr_tce_iommu_group struct which keeps an iommu_owner and a spapr_tce_iommu_ops struct. For IODA, iommu_owner is a pointer to the pnv_ioda_pe struct, for others it is still a pointer to the iommu_table struct. The ops structs correspond to the type which iommu_owner points to. At the moment a get_table() callback is the only one. It returns an iommu_table for a bus address. As the IOMMU group data pointer points to variable type instead of iommu_table, VFIO SPAPR TCE driver is fixed to use new type. This changes the tce_container struct to keep iommu_group instead of iommu_table. So, it was: - iommu_table points to iommu_group via iommu_table::it_group; - iommu_group points to iommu_table via iommu_group_get_iommudata(); now it is: - iommu_table points to iommu_group via iommu_table::it_group; - iommu_group points to spapr_tce_iommu_group via iommu_group_get_iommudata(); - spapr_tce_iommu_group points to either (depending on .get_table()): - iommu_table; - pnv_ioda_pe; This uses pnv_ioda1_iommu_get_table for both IODA1&2 but IODA2 will have own pnv_ioda2_iommu_get_table soon and pnv_ioda1_iommu_get_table will only be used for IODA1. Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/include/asm/iommu.h| 6 ++ arch/powerpc/include/asm/tce.h | 15 arch/powerpc/kernel/iommu.c | 34 - arch/powerpc/platforms/powernv/pci-ioda.c | 39 +- arch/powerpc/platforms/powernv/pci-p5ioc2.c | 1 + arch/powerpc/platforms/powernv/pci.c| 2 +- arch/powerpc/platforms/pseries/iommu.c | 10 ++- drivers/vfio/vfio_iommu_spapr_tce.c | 113 +--- 8 files changed, 184 insertions(+), 36 deletions(-) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index 42632c7..84ee339 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -108,13 +108,19 @@ extern void iommu_free_table(struct iommu_table *tbl, const char *node_name); */ extern struct iommu_table *iommu_init_table(struct iommu_table * tbl, int nid); + +struct spapr_tce_iommu_ops; #ifdef CONFIG_IOMMU_API extern void iommu_register_group(struct iommu_table *tbl, +void *iommu_owner, +struct spapr_tce_iommu_ops *ops, int pci_domain_number, unsigned long pe_num); extern int iommu_add_device(struct device *dev); extern void iommu_del_device(struct device *dev); #else static inline void iommu_register_group(struct iommu_table *tbl, + void *iommu_owner, + struct spapr_tce_iommu_ops *ops, int pci_domain_number, unsigned long pe_num) { diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h index 743f36b..8bfe98f 100644 --- a/arch/powerpc/include/asm/tce.h +++ b/arch/powerpc/include/asm/tce.h @@ -50,5 +50,20 @@ #define TCE_PCI_READ 0x1 /* read from PCI allowed */ #define TCE_VB_WRITE 0x1 /* write from VB allowed */ +struct spapr_tce_iommu_group; + +#define TCE_DEFAULT_WINDOW ~(0ULL) + +struct spapr_tce_iommu_ops { + struct iommu_table *(*get_table)( + struct spapr_tce_iommu_group *data, + phys_addr_t addr); +}; + +struct spapr_tce_iommu_group { + void *iommu_owner; + struct spapr_tce_iommu_ops *ops; +}; + #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_TCE_H */ diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index f84f799..e203314 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -877,24 +877,52 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t size, */ static void group_release(void *iommu_data) { - struct iommu_table *tbl = iommu_data; - tbl->it_group = NULL; + kfree(iommu_data); } +static struct iommu_table *spapr_tce_get_default_table( + struct spapr_tce_iommu_group *data, phys_addr_t addr) +{ + struct iommu_table *tbl = data->iommu_owner; + + if (addr == TCE_DEFAULT_WINDOW) + return tbl; + + if ((addr >> tbl->it_page_shift) < tbl->it_size) + return tbl; + + return NULL; +} + +static struct spapr_tce_iommu_ops spapr_tce_default_ops = { + .get_table = spapr_tce_get_default_table +}; + void iommu_register_group(struct iommu_table *tbl, + void *iommu_owner, struct spapr_tce_iommu_ops *ops,
[PATCH v4 01/16] rcu: Define notrace version of list_for_each_entry_rcu and list_entry_rcu
This defines list_for_each_entry_rcu_notrace which uses new list_entry_rcu_notrace which uses rcu_dereference_raw_notrace instead of rcu_dereference_raw whici allows us using list_for_each_entry_rcu_notrace when MMU is off (real mode). Signed-off-by: Alexey Kardashevskiy --- include/linux/rculist.h | 38 ++ 1 file changed, 38 insertions(+) diff --git a/include/linux/rculist.h b/include/linux/rculist.h index 8183b46..a155774 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -253,6 +253,25 @@ static inline void list_splice_init_rcu(struct list_head *list, }) /** + * list_entry_rcu_notrace - get the struct for this entry + * @ptr:the &struct list_head pointer. + * @type: the type of the struct this is embedded in. + * @member: the name of the list_struct within the struct. + * + * This primitive may safely run concurrently with the _rcu list-mutation + * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock(). + * + * This is the same as list_entry_rcu() except that it does + * not do any RCU debugging or tracing. + */ +#define list_entry_rcu_notrace(ptr, type, member) \ +({ \ + typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \ + container_of((typeof(ptr))rcu_dereference_raw_notrace(__ptr), \ + type, member); \ +}) + +/** * Where are list_empty_rcu() and list_first_entry_rcu()? * * Implementing those functions following their counterparts list_empty() and @@ -308,6 +327,25 @@ static inline void list_splice_init_rcu(struct list_head *list, pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) /** + * list_for_each_entry_rcu_notrace - iterate over rcu list of given type + * @pos: the type * to use as a loop cursor. + * @head: the head for your list. + * @member:the name of the list_struct within the struct. + * + * This list-traversal primitive may safely run concurrently with + * the _rcu list-mutation primitives such as list_add_rcu() + * as long as the traversal is guarded by rcu_read_lock(). + * + * This is the same as list_for_each_entry_rcu() except that it does + * not do any RCU debugging or tracing. + */ +#define list_for_each_entry_rcu_notrace(pos, head, member) \ + for (pos = list_entry_rcu_notrace((head)->next, typeof(*pos), member); \ + &pos->member != (head); \ + pos = list_entry_rcu_notrace(pos->member.next, typeof(*pos), \ + member)) + +/** * list_for_each_entry_continue_rcu - continue iteration over list of given type * @pos: the type * to use as a loop cursor. * @head: the head for your list. -- 2.0.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v4 16/16] vfio: powerpc/spapr: Enable Dynamic DMA windows
On 07/30/2014 07:31 PM, Alexey Kardashevskiy wrote: > This defines and implements VFIO IOMMU API required to support > Dynamic DMA windows defined in the SPAPR specification. The ioctl handlers > implement host-size part of corresponding RTAS calls: > - VFIO_IOMMU_SPAPR_TCE_QUERY - ibm,query-pe-dma-window; > - VFIO_IOMMU_SPAPR_TCE_CREATE - ibm,create-pe-dma-window; > - VFIO_IOMMU_SPAPR_TCE_REMOVE - ibm,remove-pe-dma-window; > - VFIO_IOMMU_SPAPR_TCE_RESET - ibm,reset-pe-dma-window. > > The VFIO IOMMU driver does basic sanity checks and calls corresponding > SPAPR TCE functions. At the moment only IODA2 (POWER8 PCI host bridge) > implements them. > > This advertises VFIO_IOMMU_SPAPR_TCE_FLAG_DDW capability via > VFIO_IOMMU_SPAPR_TCE_GET_INFO. > > This calls reset() when IOMMU is being disabled (happens when VFIO stops > using it). > > Signed-off-by: Alexey Kardashevskiy > --- > arch/powerpc/platforms/powernv/pci-ioda.c | 1 + > drivers/vfio/vfio_iommu_spapr_tce.c | 173 > +- > include/uapi/linux/vfio.h | 37 ++- > 3 files changed, 209 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c > b/arch/powerpc/platforms/powernv/pci-ioda.c > index 6a847b2..f51afe2 100644 > --- a/arch/powerpc/platforms/powernv/pci-ioda.c > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c > @@ -859,6 +859,7 @@ static long pnv_pci_ioda2_ddw_create(struct > spapr_tce_iommu_group *data, > > /* Copy "invalidate" register address */ > tbl64->it_index = pe->tce32.table.it_index; > + tbl64->it_group = pe->tce32.table.it_group; Just noticed. This does not belong here, this must be moved to earlier patch. > tbl64->it_type = TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE | > TCE_PCI_SWINV_PAIR; > tbl64->it_map = (void *) 0xDEADBEEF; /* poison */ > diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c > b/drivers/vfio/vfio_iommu_spapr_tce.c > index 48b256c..32e2804 100644 > --- a/drivers/vfio/vfio_iommu_spapr_tce.c > +++ b/drivers/vfio/vfio_iommu_spapr_tce.c > @@ -45,6 +45,7 @@ struct tce_container { > struct mutex lock; > struct iommu_group *grp; > bool enabled; > + unsigned long start64; > }; > > > @@ -123,19 +124,36 @@ static void tce_iommu_disable(struct tce_container > *container) > > container->enabled = false; > > - if (!container->grp || !current->mm) > + if (!container->grp) > return; > > data = iommu_group_get_iommudata(container->grp); > if (!data || !data->iommu_owner || !data->ops->get_table) > return; > > + /* Try resetting, there might have been a 64bit window */ > + if (data->ops->reset) > + data->ops->reset(data); > + > + if (!current->mm) > + return; > + > tbl = data->ops->get_table(data, TCE_DEFAULT_WINDOW); > if (!tbl) > return; > > decrement_locked_vm((tbl->it_size << tbl->it_page_shift) >> > PAGE_SHIFT); > + > + if (!container->start64) > + return; > + > + tbl = data->ops->get_table(data, container->start64); > + if (!tbl) > + return; > + > + decrement_locked_vm((tbl->it_size << tbl->it_page_shift) >> > + PAGE_SHIFT); > } > > static void *tce_iommu_open(unsigned long arg) > @@ -210,6 +228,8 @@ static long tce_iommu_ioctl(void *iommu_data, > info.dma32_window_start = tbl->it_offset << tbl->it_page_shift; > info.dma32_window_size = tbl->it_size << tbl->it_page_shift; > info.flags = 0; > + if (data->ops->query && data->ops->create && data->ops->remove) > + info.flags |= VFIO_IOMMU_SPAPR_TCE_FLAG_DDW; > > if (copy_to_user((void __user *)arg, &info, minsz)) > return -EFAULT; > @@ -335,6 +355,157 @@ static long tce_iommu_ioctl(void *iommu_data, > tce_iommu_disable(container); > mutex_unlock(&container->lock); > return 0; > + > + case VFIO_IOMMU_SPAPR_TCE_QUERY: { > + struct vfio_iommu_spapr_tce_query query; > + struct spapr_tce_iommu_group *data; > + > + if (WARN_ON(!container->grp)) > + return -ENXIO; > + > + data = iommu_group_get_iommudata(container->grp); > + > + minsz = offsetofend(struct vfio_iommu_spapr_tce_query, > + page_size_mask); > + > + if (copy_from_user(&query, (void __user *)arg, minsz)) > + return -EFAULT; > + > + if (query.argsz < minsz) > + return -EINVAL; > + > + if (!data->ops->query || !data->iommu_owner) > + return -ENOSYS; > + > + ret = data->ops->query(data, > + &query.windows_available, > + &quer
[PATCH RFC] ASoC: fsl: Add Freescale Generic ASoC Sound Card with ASRC support
The Freescale Generic ASoC Sound Card is a general ASoC DAI Link driver that can be used, ideally, for all Freescale CPU DAI drivers and external CODECs. The idea of this generic sound card is a bit like ASoC Simple Card. However, for Freescale SoCs (especially those released in recent years), most of them have ASRC (Documentation/devicetree/bindings/sound/fsl,asrc.txt) inside. And this is a specific feature that might be painstakingly controlled and merged into the Simple Card driver. So having this driver will allow all Freescale SoC users to benefit from the simplification to support a new card and the capability of wide sample rates support through ASRC. The driver is initially designed for sound card using I2S or PCM DAI formats. However, it's also possible to merge those non-I2S/PCM type sound cards, such as S/PDIF audio and HDMI audio, into this card as long as the merge will not break the original function and as long as there is something redundant that can be abstracted along with I2S type sound cards. As an initial version, it only supports three cards that I can test: imx-audio-cs42888, a new card that links ESAI with CS42888 CODEC imx-audio-sgtl5000, just like the old imx-sgtl5000.c driver imx-audio-wm8962, just like the old imx-wm8962.c driver The driver is also compatible with the old Device Tree bindings of WM8962 and SGTL5000. So we may consider to remove those two drivers after this driver is totally enabled. (It needs to be added into defconfig) Signed-off-by: Nicolin Chen --- .../devicetree/bindings/sound/fsl-asoc-card.txt| 82 +++ sound/soc/fsl/Kconfig | 16 + sound/soc/fsl/Makefile | 2 + sound/soc/fsl/fsl-asoc-card.c | 573 + 4 files changed, 673 insertions(+) create mode 100644 Documentation/devicetree/bindings/sound/fsl-asoc-card.txt create mode 100644 sound/soc/fsl/fsl-asoc-card.c diff --git a/Documentation/devicetree/bindings/sound/fsl-asoc-card.txt b/Documentation/devicetree/bindings/sound/fsl-asoc-card.txt new file mode 100644 index 000..a96774c --- /dev/null +++ b/Documentation/devicetree/bindings/sound/fsl-asoc-card.txt @@ -0,0 +1,82 @@ +Freescale Generic ASoC Sound Card with ASRC support + +The Freescale Generic ASoC Sound Card can be used, ideally, for all Freescale +SoCs connecting with external CODECs. + +The idea of this generic sound card is a bit like ASoC Simple Card. However, +for Freescale SoCs (especially those released in recent years), most of them +have ASRC (Documentation/devicetree/bindings/sound/fsl,asrc.txt) inside. And +this is a specific feature that might be painstakingly controlled and merged +into the Simple Card. + +So having this generic sound card allows all Freescale SoC users to benefit +from the simplification of a new card support and the capability of the wide +sample rates support through ASRC. + +Note: The card is initially designed for those sound cards who use I2S and + PCM DAI formats. However, it'll be also possible to support those non + I2S/PCM type sound cards, such as S/PDIF audio and HDMI audio, as long + as the driver has been properly upgraded. + + +The compatible list for this generic sound card currently: + "fsl,imx-audio-cs42888" + + "fsl,imx-audio-wm8962" + (compatible with Documentation/devicetree/bindings/sound/imx-audio-wm8962.txt) + + "fsl,imx-audio-sgtl5000" + (compatible with Documentation/devicetree/bindings/sound/imx-audio-sgtl5000.txt) + +Required properties: + + - compatible : Contains one of entries in the compatible list. + + - model : The user-visible name of this sound complex + + - audio-cpu : The phandle of an CPU DAI controller + + - audio-codec: The phandle of an audio codec + + - audio-routing : A list of the connections between audio components. + Each entry is a pair of strings, the first being the + connection's sink, the second being the connection's + source. There're a few pre-designed board connectors: + * Line Out Jack + * Line In Jack + * Headphone Jack + * Mic Jack + * Ext Spk + * AMIC (stands for Analog Microphone Jack) + * DMIC (stands for Digital Microphone Jack) + + Note: The "Mic Jack" and "AMIC" are redundant while + coexsiting in order to support the old bindings + of wm8962 and sgtl5000. + +Optional properties: + + - audio-asrc : The phandle of ASRC. It can be absent if there's no + need to add ASRC support via DPCM. + +Example: +sound-cs42888 { + compatible = "fsl,imx-audio-cs42888"; + model = "cs42888-audio"; + audio-cpu = <&
[PATCH 3.12 71/94] locking/mutex: Disable optimistic spinning on some architectures
From: Peter Zijlstra 3.12-stable review patch. If anyone has any objections, please let me know. === commit 4badad352a6bb202ec68afa7a574c0bb961e5ebc upstream. The optimistic spin code assumes regular stores and cmpxchg() play nice; this is found to not be true for at least: parisc, sparc32, tile32, metag-lock1, arc-!llsc and hexagon. There is further wreckage, but this in particular seemed easy to trigger, so blacklist this. Opt in for known good archs. Signed-off-by: Peter Zijlstra Reported-by: Mikulas Patocka Cc: David Miller Cc: Chris Metcalf Cc: James Bottomley Cc: Vineet Gupta Cc: Jason Low Cc: Waiman Long Cc: "James E.J. Bottomley" Cc: Paul McKenney Cc: John David Anglin Cc: James Hogan Cc: Linus Torvalds Cc: Davidlohr Bueso Cc: Benjamin Herrenschmidt Cc: Catalin Marinas Cc: Russell King Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: sparcli...@vger.kernel.org Link: http://lkml.kernel.org/r/20140606175316.gv13...@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar Signed-off-by: Jiri Slaby --- arch/arm/Kconfig | 1 + arch/arm64/Kconfig | 1 + arch/powerpc/Kconfig | 1 + arch/sparc/Kconfig | 1 + arch/x86/Kconfig | 1 + kernel/Kconfig.locks | 5 - 6 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index e47fcd1e9645..99e1ce978cf9 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -5,6 +5,7 @@ config ARM select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAVE_CUSTOM_GPIO_H + select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_WANT_IPC_PARSE_VERSION select BUILDTIME_EXTABLE_SORT if MMU select CLONE_BACKWARDS diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index c04454876bcb..fe70eaea0e28 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1,6 +1,7 @@ config ARM64 def_bool y select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE + select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_WANT_OPTIONAL_GPIOLIB select ARCH_WANT_COMPAT_IPC_PARSE_VERSION select ARCH_WANT_FRAME_POINTERS diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index d5d026b6d237..2e0ddfadc0b9 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -138,6 +138,7 @@ config PPC select OLD_SIGSUSPEND select OLD_SIGACTION if PPC32 select HAVE_DEBUG_STACKOVERFLOW + select ARCH_SUPPORTS_ATOMIC_RMW config EARLY_PRINTK bool diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 4e5683877b93..d60f34dbae89 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -75,6 +75,7 @@ config SPARC64 select ARCH_HAVE_NMI_SAFE_CMPXCHG select HAVE_C_RECORDMCOUNT select NO_BOOTMEM + select ARCH_SUPPORTS_ATOMIC_RMW config ARCH_DEFCONFIG string diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index eb2dfa61eabe..9dc1a24d41b8 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -123,6 +123,7 @@ config X86 select COMPAT_OLD_SIGACTION if IA32_EMULATION select RTC_LIB select HAVE_DEBUG_STACKOVERFLOW + select ARCH_SUPPORTS_ATOMIC_RMW config INSTRUCTION_DECODER def_bool y diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index d2b32ac27a39..ecee67a00f5f 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -220,6 +220,9 @@ config INLINE_WRITE_UNLOCK_IRQRESTORE endif +config ARCH_SUPPORTS_ATOMIC_RMW + bool + config MUTEX_SPIN_ON_OWNER def_bool y - depends on SMP && !DEBUG_MUTEXES + depends on SMP && !DEBUG_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW -- 2.0.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 4/4] ASoC: fsl_ssi: Add stream names for DPCM usage
Nicolin Chen wrote: DPCM needs extra dapm routes in the machine driver to route audio between Front-End and Back-End. In order to differ the stream names in the route map from CODECs, we here add specific stream names to SSI driver so that we can implement ASRC via DPCM to it. Signed-off-by: Nicolin Chen Acked-by: Timur Tabi ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [2/2] powerpc/fsl-booke: Add initial T1042RDB_PI board support
On Wed, Jul 09, 2014 at 09:54:11AM +0530, Priyanka Jain wrote: > diff --git a/arch/powerpc/boot/dts/t104xrdb.dtsi > b/arch/powerpc/boot/dts/t104xrdb.dtsi > index 9aaefa5..e7e765f 100644 > --- a/arch/powerpc/boot/dts/t104xrdb.dtsi > +++ b/arch/powerpc/boot/dts/t104xrdb.dtsi > @@ -57,7 +57,8 @@ > }; > > cpld@3,0 { > - compatible = "fsl,t1040rdb-cpld","fsl,t1042rdb-cpld"; > + compatible = "fsl,t1040rdb-cpld","fsl,t1042rdb-cpld", > + "fsl,t1042rdb_pi-cpld"; > reg = <3 0 0x300>; > }; > }; What's going on here? This file is used by all three boards. If you need to distinguish one board's CPLD from another's, you'll have to do it somewhere else. If the CPLDs are exactly the same and no distinction needs to be made, then you don't need three compatible strings. Even then, you may wish to specify the exact board as the first compatible string, but again you'll need to patch that in elsewhere so that it actually matches the board. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 5/7] powerpc/corenet: Add MDIO bus muxing support to the board device tree(s)
Hello Scott, On 07/29/2014 02:58 PM, Scott Wood wrote: > On Mon, 2014-07-28 at 06:51 +, Emil Medve wrote: >> Hello Scott, >> >> >> Scott Wood freescale.com> writes: >>> On Wed, 2014-07-16 at 15:17 -0500, Shruti Kanetkar wrote: + mdio fd000 { + /* For 10g interfaces */ + phy_xaui_slot1: xaui-phy slot1 { + status = "disabled"; + compatible = "ethernet-phy-ieee802.3-c45"; + reg = <0x7>; /* default switch setting on slot1 of AMC2PEX */ + }; >>> >>> Why xaui-phy and not ethernet-phy? >>> >>> As for the device_type discussion from v1, there is a generic binding >>> that says device_type "should" be ethernet-phy. >> >> I have no strong feelings about this and we can use ethernet-phy, but: >> >> 1. The binding is old/stale (?) as it still uses device_type and the kernel >> doesn't seem to use anymore the device_type for PHY(s) > > Yes. > >> 2. The binding asks "ethernet-phy" for the device_type property, not for the >> name. As such TBI PHY(s) use (upstream) the tbi-phy@ node name > > It shows ethernet-phy as the name in the example. ePAPR urges generic > node names (this was also a recommendation for IEEE1275), and has > ethernet-phy on the preferred list. Is a xaui-phy not an ethernet phy? So you thinking somebody should cleanup all the sgmii-phy and tbi-phy node names, huh? It seems that a number of tbi-phy instances slipped by you: 1be62c6 powerpc/mpc85xx: Add BSC9132 QDS Support bf57aeb powerpc/85xx: add the P1020RDB-PD DTS support 8a6be2b powerpc/85xx: Add TWR-P1025 board support + mdio0: mdio fc000 { + }; >>> >>> Why is the empty node needed? >> >> For the label > > For mdio-parent-bus, or is there some other dts layer that makes this > node non-empty? 'powerpc/corenet: Create the dts components for the DPAA FMan' - http://patchwork.ozlabs.org/patch/370872 and 'powerpc/corenet: Add DPAA FMan support to the SoC device tree(s)' - http://patchwork.ozlabs.org/patch/370868 add content to said node Cheers, ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 5/7] powerpc/corenet: Add MDIO bus muxing support to the board device tree(s)
On Wed, 2014-07-30 at 16:52 -0500, Emil Medve wrote: > Hello Scott, > > > On 07/29/2014 02:58 PM, Scott Wood wrote: > > On Mon, 2014-07-28 at 06:51 +, Emil Medve wrote: > >> Hello Scott, > >> > >> > >> Scott Wood freescale.com> writes: > >>> On Wed, 2014-07-16 at 15:17 -0500, Shruti Kanetkar wrote: > +mdio fd000 { > +/* For 10g interfaces */ > +phy_xaui_slot1: xaui-phy slot1 { > +status = "disabled"; > +compatible = > "ethernet-phy-ieee802.3-c45"; > +reg = <0x7>; /* default switch > setting on slot1 of AMC2PEX */ > +}; > >>> > >>> Why xaui-phy and not ethernet-phy? > >>> > >>> As for the device_type discussion from v1, there is a generic binding > >>> that says device_type "should" be ethernet-phy. > >> > >> I have no strong feelings about this and we can use ethernet-phy, but: > >> > >> 1. The binding is old/stale (?) as it still uses device_type and the kernel > >> doesn't seem to use anymore the device_type for PHY(s) > > > > Yes. > > > >> 2. The binding asks "ethernet-phy" for the device_type property, not for > >> the > >> name. As such TBI PHY(s) use (upstream) the tbi-phy@ node name > > > > It shows ethernet-phy as the name in the example. ePAPR urges generic > > node names (this was also a recommendation for IEEE1275), and has > > ethernet-phy on the preferred list. Is a xaui-phy not an ethernet phy? > > So you thinking somebody should cleanup all the sgmii-phy and tbi-phy > node names, huh? No, I was just wondering why we're adding yet another name, and whether there's any value in it. > It seems that a number of tbi-phy instances slipped by you: > > 1be62c6 powerpc/mpc85xx: Add BSC9132 QDS Support > bf57aeb powerpc/85xx: add the P1020RDB-PD DTS support > 8a6be2b powerpc/85xx: Add TWR-P1025 board support tbi-phy is existing practice. xaui-phy isn't. > +mdio0: mdio fc000 { > +}; > >>> > >>> Why is the empty node needed? > >> > >> For the label > > > > For mdio-parent-bus, or is there some other dts layer that makes this > > node non-empty? > > 'powerpc/corenet: Create the dts components for the DPAA FMan' - > http://patchwork.ozlabs.org/patch/370872 Why does this patch define the mdio0 label for mdio@e1120, but not define a label for any other node? > and 'powerpc/corenet: Add DPAA > FMan support to the SoC device tree(s)' - > http://patchwork.ozlabs.org/patch/370868 add content to said node This one adds content to some mdio nodes, none of which are mdio@fc000 or &mdio0. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] ASoC: fsl_asrc: Fix sparse warnings in FSL_ASRC_FORMATS due to typo
reproduce: make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) >> sound/soc/fsl/fsl_asrc.c:563:28: sparse: restricted snd_pcm_format_t >> degrades to integer >> sound/soc/fsl/fsl_asrc.c:570:28: sparse: restricted snd_pcm_format_t >> degrades to integer vim +563 sound/soc/fsl/fsl_asrc.c 557 .probe = fsl_asrc_dai_probe, 558 .playback = { 559 .stream_name = "ASRC-Playback", 560 .channels_min = 1, 561 .channels_max = 10, 562 .rates = FSL_ASRC_RATES, > 563 .formats = FSL_ASRC_FORMATS, 564 }, 565 .capture = { 566 .stream_name = "ASRC-Capture", 567 .channels_min = 1, 568 .channels_max = 10, 569 .rates = FSL_ASRC_RATES, > 570 .formats = FSL_ASRC_FORMATS, 571 }, 572 .ops = &fsl_asrc_dai_ops, 573 }; Reported-by: kbuild test robot Signed-off-by: Nicolin Chen --- sound/soc/fsl/fsl_asrc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c index 41699f7..cdb5779 100644 --- a/sound/soc/fsl/fsl_asrc.c +++ b/sound/soc/fsl/fsl_asrc.c @@ -551,7 +551,7 @@ static int fsl_asrc_dai_probe(struct snd_soc_dai *dai) #define FSL_ASRC_RATES SNDRV_PCM_RATE_8000_192000 #define FSL_ASRC_FORMATS (SNDRV_PCM_FMTBIT_S24_LE | \ SNDRV_PCM_FMTBIT_S16_LE | \ -SNDRV_PCM_FORMAT_S20_3LE) +SNDRV_PCM_FMTBIT_S20_3LE) static struct snd_soc_dai_driver fsl_asrc_dai = { .probe = fsl_asrc_dai_probe, -- 1.8.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [2/2] powerpc/fsl-booke: Add initial T1042RDB_PI board support
-Original Message- From: Wood Scott-B07421 Sent: Thursday, July 31, 2014 1:43 AM To: Jain Priyanka-B32167 Cc: devicet...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Aggrwal Poonam-B10812; Kushwaha Prabhakar-B32579 Subject: Re: [2/2] powerpc/fsl-booke: Add initial T1042RDB_PI board support On Wed, Jul 09, 2014 at 09:54:11AM +0530, Priyanka Jain wrote: > diff --git a/arch/powerpc/boot/dts/t104xrdb.dtsi > b/arch/powerpc/boot/dts/t104xrdb.dtsi > index 9aaefa5..e7e765f 100644 > --- a/arch/powerpc/boot/dts/t104xrdb.dtsi > +++ b/arch/powerpc/boot/dts/t104xrdb.dtsi > @@ -57,7 +57,8 @@ > }; > > cpld@3,0 { > - compatible = "fsl,t1040rdb-cpld","fsl,t1042rdb-cpld"; > + compatible = "fsl,t1040rdb-cpld","fsl,t1042rdb-cpld", > + "fsl,t1042rdb_pi-cpld"; > reg = <3 0 0x300>; > }; > }; What's going on here? This file is used by all three boards. If you need to distinguish one board's CPLD from another's, you'll have to do it somewhere else. If the CPLDs are exactly the same and no distinction needs to be made, then you don't need three compatible strings. Even then, you may wish to specify the exact board as the first compatible string, but again you'll need to patch that in elsewhere so that it actually matches the board . As the register set of CPLD for all three boards is same, I am thinking of replacing this with t104srdb-cpld compatible = "fsl,t104xrdb-cpld"," Is this OK? -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 5/7] powerpc/corenet: Add MDIO bus muxing support to the board device tree(s)
Hello Scott, On 07/30/2014 09:30 PM, Scott Wood wrote: > On Wed, 2014-07-30 at 16:52 -0500, Emil Medve wrote: >> Hello Scott, >> >> >> On 07/29/2014 02:58 PM, Scott Wood wrote: >>> On Mon, 2014-07-28 at 06:51 +, Emil Medve wrote: Hello Scott, Scott Wood freescale.com> writes: > On Wed, 2014-07-16 at 15:17 -0500, Shruti Kanetkar wrote: >> +mdio fd000 { >> +/* For 10g interfaces */ >> +phy_xaui_slot1: xaui-phy slot1 { >> +status = "disabled"; >> +compatible = >> "ethernet-phy-ieee802.3-c45"; >> +reg = <0x7>; /* default switch >> setting on slot1 of AMC2PEX */ >> +}; > > Why xaui-phy and not ethernet-phy? > > As for the device_type discussion from v1, there is a generic binding > that says device_type "should" be ethernet-phy. I have no strong feelings about this and we can use ethernet-phy, but: 1. The binding is old/stale (?) as it still uses device_type and the kernel doesn't seem to use anymore the device_type for PHY(s) >>> >>> Yes. >>> 2. The binding asks "ethernet-phy" for the device_type property, not for the name. As such TBI PHY(s) use (upstream) the tbi-phy@ node name >>> >>> It shows ethernet-phy as the name in the example. ePAPR urges generic >>> node names (this was also a recommendation for IEEE1275), and has >>> ethernet-phy on the preferred list. Is a xaui-phy not an ethernet phy? >> >> So you thinking somebody should cleanup all the sgmii-phy and tbi-phy >> node names, huh? > > No, I was just wondering why we're adding yet another name, and whether > there's any value in it. That's fair. We'll just use ethernet-phy >> It seems that a number of tbi-phy instances slipped by you: >> >> 1be62c6 powerpc/mpc85xx: Add BSC9132 QDS Support >> bf57aeb powerpc/85xx: add the P1020RDB-PD DTS support >> 8a6be2b powerpc/85xx: Add TWR-P1025 board support > > tbi-phy is existing practice. xaui-phy isn't. > >> +mdio0: mdio fc000 { >> +}; > > Why is the empty node needed? For the label >>> >>> For mdio-parent-bus, or is there some other dts layer that makes this >>> node non-empty? >> >> 'powerpc/corenet: Create the dts components for the DPAA FMan' - >> http://patchwork.ozlabs.org/patch/370872 > > Why does this patch define the mdio0 label for mdio@e1120, but not > define a label for any other node? Only MDIO controllers that are pinned out have these labels. Only pinned out MDIO(s) are capable of controlling external PHY(s) via these board level MDIO buses >> and 'powerpc/corenet: Add DPAA >> FMan support to the SoC device tree(s)' - >> http://patchwork.ozlabs.org/patch/370868 add content to said node > > This one adds content to some mdio nodes, none of which are mdio@fc000 > or &mdio0. This patch adds the SoC level PHY(s), which in this case are just TBI PHY(s): i.e. no FMan v2 10 Gb/s MDIO or FMan v3 standalone MDIO devices. Also the labels become relevant only at board level to connect the MDIO buses to their corresponding MDIO controllers Cheers, ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [2/2] powerpc/fsl-booke: Add initial T1042RDB_PI board support
On Wed, 2014-07-30 at 23:37 -0500, Jain Priyanka-B32167 wrote: > > -Original Message- > From: Wood Scott-B07421 > Sent: Thursday, July 31, 2014 1:43 AM > To: Jain Priyanka-B32167 > Cc: devicet...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Aggrwal > Poonam-B10812; Kushwaha Prabhakar-B32579 > Subject: Re: [2/2] powerpc/fsl-booke: Add initial T1042RDB_PI board support > > On Wed, Jul 09, 2014 at 09:54:11AM +0530, Priyanka Jain wrote: > > diff --git a/arch/powerpc/boot/dts/t104xrdb.dtsi > > b/arch/powerpc/boot/dts/t104xrdb.dtsi > > index 9aaefa5..e7e765f 100644 > > --- a/arch/powerpc/boot/dts/t104xrdb.dtsi > > +++ b/arch/powerpc/boot/dts/t104xrdb.dtsi > > @@ -57,7 +57,8 @@ > > }; > > > > cpld@3,0 { > > - compatible = "fsl,t1040rdb-cpld","fsl,t1042rdb-cpld"; > > + compatible = "fsl,t1040rdb-cpld","fsl,t1042rdb-cpld", > > + "fsl,t1042rdb_pi-cpld"; > > reg = <3 0 0x300>; > > }; > > }; > > What's going on here? This file is used by all three boards. If you need to > distinguish one board's CPLD from another's, you'll have to do it somewhere > else. If the CPLDs are exactly the same and no distinction needs to be made, > then you don't need three compatible strings. Even then, you may wish to > specify the exact board as the first compatible string, but again you'll need > to patch that in elsewhere so that it actually matches the board > . > As the register set of CPLD for all three boards is same, I am thinking of > replacing this with t104srdb-cpld > compatible = "fsl,t104xrdb-cpld"," > Is this OK? No. Wildcards aren't allowed in compatible strings, because you never know what other devices might exist in the future that match the wildcard. If the CPLD logic is truly 100% identical, just pick one of the three to be the canonical name. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 5/7] powerpc/corenet: Add MDIO bus muxing support to the board device tree(s)
On Wed, 2014-07-30 at 23:35 -0500, Emil Medve wrote: > Hello Scott, > > > On 07/30/2014 09:30 PM, Scott Wood wrote: > > On Wed, 2014-07-30 at 16:52 -0500, Emil Medve wrote: > >> + mdio0: mdio fc000 { > >> + }; > > > > Why is the empty node needed? > > For the label > >>> > >>> For mdio-parent-bus, or is there some other dts layer that makes this > >>> node non-empty? > >> > >> 'powerpc/corenet: Create the dts components for the DPAA FMan' - > >> http://patchwork.ozlabs.org/patch/370872 > > > > Why does this patch define the mdio0 label for mdio@e1120, but not > > define a label for any other node? > > Only MDIO controllers that are pinned out have these labels. Only pinned > out MDIO(s) are capable of controlling external PHY(s) via these board > level MDIO buses Is there any reason to describe non-pinned-out MDIO controllers at all? Is the lack of pinning out inherent to the silicon, or is it board design/config? Is the answer different for different MDIO controllers? I'm just curious why mdio@e1120 is labelled in a non-board dtsi while others are labelled elsewhere. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 5/7] powerpc/corenet: Add MDIO bus muxing support to the board device tree(s)
Hello Scott, On 07/31/2014 12:28 AM, Scott Wood wrote: > On Wed, 2014-07-30 at 23:35 -0500, Emil Medve wrote: >> Hello Scott, >> >> >> On 07/30/2014 09:30 PM, Scott Wood wrote: >>> On Wed, 2014-07-30 at 16:52 -0500, Emil Medve wrote: + mdio0: mdio fc000 { + }; >>> >>> Why is the empty node needed? >> >> For the label > > For mdio-parent-bus, or is there some other dts layer that makes this > node non-empty? 'powerpc/corenet: Create the dts components for the DPAA FMan' - http://patchwork.ozlabs.org/patch/370872 >>> >>> Why does this patch define the mdio0 label for mdio@e1120, but not >>> define a label for any other node? >> >> Only MDIO controllers that are pinned out have these labels. Only pinned >> out MDIO(s) are capable of controlling external PHY(s) via these board >> level MDIO buses > > Is there any reason to describe non-pinned-out MDIO controllers at all? Yes. For the internal TBI PHY(s). Each MAC supporting SGMII has a TBI PHY that is attached to the MDIO controller of the respective MAC > Is the lack of pinning out inherent to the silicon, or is it board > design/config? It's a silicon level decision > Is the answer different for different MDIO controllers? You mean non-FSL MDIO controllers? Dunno. All FSL SoC have the same MDIO pin-out decision > I'm just curious why mdio@e1120 is labelled in a non-board dtsi while > others are labelled elsewhere. Labels are relevant only in the context of 'powerpc/corenet: Add MDIO bus muxing support to the board device tree(s)' - http://patchwork.ozlabs.org/patch/370866. Most labels are created and used in the board .dts file except b4qds.dtsi which is shared between b4420qds.dts and b4860qds.dts Cheers, ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V7 00/17] Enable SRIOV on POWER8
On Thu, 2014-07-24 at 14:22 +0800, Wei Yang wrote: > This patch set enables the SRIOV on POWER8. Hi Bjorn ! There are 4 patches in there to the generic code, but so far not much review from your side of the fence :-) How do you want to proceed ? Cheers, Ben. > The gerneral idea is put each VF into one individual PE and allocate required > resources like DMA/MSI. > > One thing special for VF PE is we use M64BT to cover the IOV BAR. M64BT is one > hardware on POWER platform to map MMIO address to PE. By using M64BT, we could > map one individual VF to a VF PE, which introduce more flexiblity to users. > > To achieve this effect, we need to do some hack on pci devices's resources. > 1. Expand the IOV BAR properly. >Done by pnv_pci_ioda_fixup_iov_resources(). > 2. Shift the IOV BAR properly. >Done by pnv_pci_vf_resource_shift(). > 3. IOV BAR alignment is the total size instead of an individual size on >powernv platform. >Done by pnv_pcibios_sriov_resource_alignment(). > 4. Take the IOV BAR alignment into consideration in the sizing and assigning. >This is achieved by commit: "PCI: Take additional IOV BAR alignment in >sizing and assigning" > > Test Environment: >The SRIOV device tested is Emulex Lancer and Mellanox ConnectX-3 on >POWER8. > > Examples on pass through a VF to guest through vfio: > 1. install necessary modules > modprobe vfio > modprobe vfio-pci > 2. retrieve the iommu_group the device belongs to > readlink /sys/bus/pci/devices/:06:0d.0/iommu_group > ../../../../kernel/iommu_groups/26 > This means it belongs to group 26 > 3. see how many devices under this iommu_group > ls /sys/kernel/iommu_groups/26/devices/ > 4. unbind the original driver and bind to vfio-pci driver > echo :06:0d.0 > /sys/bus/pci/devices/:06:0d.0/driver/unbind > echo 1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id > Note: this should be done for each device in the same iommu_group > 5. Start qemu and pass device through vfio > /home/ywywyang/git/qemu-impreza/ppc64-softmmu/qemu-system-ppc64 \ > -M pseries -m 2048 -enable-kvm -nographic \ > -drive file=/home/ywywyang/kvm/fc19.img \ > -monitor telnet:localhost:5435,server,nowait -boot cd \ > -device > "spapr-pci-vfio-host-bridge,id=CXGB3,iommu=26,index=6" > > Verify this is the exact VF response: > 1. ping from a machine in the same subnet(the broadcast domain) > 2. run arp -n on this machine > 9.115.251.20 ether 00:00:c9:df:ed:bf C eth0 > 3. ifconfig in the guest > # ifconfig eth1 > eth1: flags=4163 mtu 1500 > inet 9.115.251.20 netmask 255.255.255.0 broadcast > 9.115.251.255 > inet6 fe80::200:c9ff:fedf:edbf prefixlen 64 scopeid 0x20 > ether 00:00:c9:df:ed:bf txqueuelen 1000 (Ethernet) > RX packets 175 bytes 13278 (12.9 KiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 58 bytes 9276 (9.0 KiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > 4. They have the same MAC address > > Note: make sure you shutdown other network interfaces in guest. > > --- > v6 -> v7: >1. add IORESOURCE_ARCH flag for IOV BAR on powernv platform. >2. when IOV BAR has IORESOURCE_ARCH flag, the size is retrieved from > hardware directly. If not, calculate as usual. >3. reorder the patch set, group them by subsystem: > PCI, powerpc, powernv >4. rebase it on 3.16-rc6 > v5 -> v6: >1. remove pcibios_enable_sriov()/pcibios_disable_sriov() weak function > similar function is moved to > pnv_pci_enable_device_hook()/pnv_pci_disable_device_hook(). When PF is > enabled, platform will try best to allocate resources for VFs. >2. remove pcibios_sriov_resource_size weak function >3. VF BAR size is retrieved from hardware directly in virtfn_add() > v4 -> v5: >1. merge those SRIOV related platform functions in machdep_calls > wrap them in one CONFIG_PCI_IOV marco >2. define IODA_INVALID_M64 to replace (-1) > use this value to represent the m64_wins is not used >3. rename pnv_pci_release_dev_dma() to pnv_pci_ioda2_release_dma_pe() > this function is a conterpart to pnv_pci_ioda2_setup_dma_pe() >4. change dev_info() to dev_dgb() in pnv_pci_ioda_fixup_iov_resources() > reduce some log in kernel >5. release M64 window in pnv_pci_ioda2_release_dma_pe() > v3 -> v4: >1. code format fix, eg. not exceed 80 chars >2. in commit "ppc/pnv: Add function to deconfig a PE" > check the bus has a bridge before print the name > remove a PE from its own PELTV >3. change the function name for sriov resource size/alignment >4. rebase on 3.16-rc3 >5. VFs will n