scheduler crash on Power

2014-07-30 Thread Sukadev Bhattiprolu

I am getting this crash on a Powerpc system using 3.16.0-rc7 kernel plus
some patches related to perf (24x7 counters) that Cody Schafer posted here:

https://lkml.org/lkml/2014/5/27/768

I don't get the crash on an unpatched kernel though.

I have been staring at the perf event patches, but can't find anything
impacting the scheduler. Besides the patches had worked on 3.16.0-rc2
kernel on a different Power system.

The crash occurs on an idle system, a minute or two after booting to
runlevel 3.

kernel/sched/core.c:

---
5877 static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
5878 {
5879 struct sched_group *sg = sd->groups;
5880 
5881 WARN_ON(!sg);
5882 
5883 do {
5884 sg->group_weight = cpumask_weight(sched_group_cpus(sg));

---


I tried applying the patch discussed in https://lkml.org/lkml/2014/7/16/386
but doesn't seem to help.

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bc1638b..50702a8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5842,6 +5842,8 @@ build_sched_groups(struct sched_domain *sd, int cpu)
continue;
 
group = get_group(i, sdd, &sg);
+   cpumask_clear(sched_group_cpus(sg));
+   sg->sgc->capacity = 0;
cpumask_setall(sched_group_mask(sg));
 
for_each_cpu(j, span) {


I am also attaching the debug messages that Peterz added
here: https://lkml.org/lkml/2014/7/17/288

Appreciate any debug suggestions.

Sukadev



Red Hat Enterprise Linux Server 7.0 (Maipo)
Kernel 3.16.0-rc7-24x7+ on an ppc64

ltcbrazos2-lp07 login: 

Red Hat Enterprise Linux Server 7.0 (Maipo)
Kernel 3.16.0-rc7-24x7+ on an ppc64

ltcbrazos2-lp07 login: [  181.915974] [ cut here ]
[  181.915991] WARNING: at ../kernel/sched/core.c:5881
[  181.915994] Modules linked in: sg cfg80211 rfkill nx_crypto ibmveth 
pseries_rng xfs libcrc32c sd_mod crc_t10dif crct10dif_common ibmvscsi 
scsi_transport_srp scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
[  181.916024] CPU: 4 PID: 1087 Comm: kworker/4:2 Not tainted 3.16.0-rc7-24x7+ 
#15
[  181.916034] Workqueue: events .topology_work_fn
[  181.916038] task: c000dbd4 ti: c000da40 task.ti: 
c000da40
[  181.916043] NIP: c00d7528 LR: c00d7578 CTR: 
[  181.916047] REGS: c000da403580 TRAP: 0700   Not tainted  
(3.16.0-rc7-24x7+)
[  181.916051] MSR: 800100029032   CR: 28484c24  XER: 

[  181.916063] CFAR: c00d74f4 SOFTE: 1 
GPR00: c00d7578 c000da403800 c0eaa7f0 0800 
GPR04: 0800 0800  c09cf878 
GPR08: c09cf880 0001 0010  
GPR12:  cebe1200 0800 c000cc2f 
GPR16: c0ef0a68 0078 c000e500 0078 
GPR20:  0001 c000cc2f 0001 
GPR24: c0db4402 000f  c000dea39300 
GPR28: c0ef0ae0 c000e544  c0ef4f7c 
[  181.916146] NIP [c00d7528] .build_sched_domains+0xc28/0xd90
[  181.916151] LR [c00d7578] .build_sched_domains+0xc78/0xd90
[  181.916155] Call Trace:
[  181.916159] [c000da403800] [c00d7578] 
.build_sched_domains+0xc78/0xd90 (unreliable)
[  181.916166] [c000da403950] [c00d7950] 
.partition_sched_domains+0x260/0x3f0
[  181.916175] [c000da403a30] [c0141704] 
.rebuild_sched_domains_locked+0x54/0x70
[  181.916182] [c000da403ab0] [c0143a98] 
.rebuild_sched_domains+0x28/0x50
[  181.916188] [c000da403b30] [c004f250] .topology_work_fn+0x10/0x30
[  181.916194] [c000da403ba0] [c00b7100] 
.process_one_work+0x1a0/0x4c0
[  181.916199] [c000da403c40] [c00b7970] .worker_thread+0x180/0x630
[  181.916205] [c000da403d30] [c00bfc88] .kthread+0x108/0x130
[  181.916214] [c000da403e30] [c000a3e4] 
.ret_from_kernel_thread+0x58/0x74
[  181.916220] Instruction dump:
[  181.916223] 7f47492a e93c e90a0010 7d0a4378 7d4a482a 814a 2f8a 
419e0008 
[  181.916235] 7f48492a ebdd0010 7fc90074 7929d182 <0b09> 4814 6000 
6000 
[  181.916245] ---[ end trace 6e9d20016598c36c ]---
[  181.916253] Unable to handle kernel paging request for data at address 
0x0018
[  181.916257] Faulting instruction address: 0xc039d1c0
[  181.916263] Oops: Kernel access of bad area, sig: 11 [#1]
[  181.916267] SMP NR_CPUS=2048 NUMA pSeries
[  181.916271] Modules linked in: sg cfg80211 rfkill nx_crypto ibmveth 
pseries_rng xfs libcrc32c sd_mod crc_t10dif crct10dif_common ibmvscsi 
scsi_transport_srp scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
[  181.916293] CPU: 4 PID: 1087 Comm: kworker/4:2 Tainted: GW 
3.16.0-rc7-24x7+ #15
[  181.916299] Workqueue: events .top

[PATCH v4 00/16] powernv: vfio: Add Dynamic DMA windows (DDW)

2014-07-30 Thread Alexey Kardashevskiy
This prepares existing upstream kernel for DDW (Dynamic DMA windows) and
adds actual DDW support for VFIO.

This patchset does not contain any in-kernel acceleration stuff.

This patchset does not enable DDW for emulated devices.


Changes:
v4:
* addressed Ben's comments
* big rework with moving tce_xxx callbacks out of ppc_md

v3:
* applied multiple comments from Gavin regarding error checking
and callbacks placements

v2:
* moved "Account TCE pages in locked_vm" here (was in later series)
* added counting for huge window to locked_vm (ugly but better than nothing)
* fixed bug with missing >>PAGE_SHIFT when calling pfn_to_page




Alexey Kardashevskiy (16):
  rcu: Define notrace version of list_for_each_entry_rcu and
list_entry_rcu
  KVM: PPC: Use RCU for arch.spapr_tce_tables
  mm: Add helpers for locked_vm
  KVM: PPC: Account TCE-containing pages in locked_vm
  powerpc/iommu: Fix comments with it_page_shift
  powerpc/powernv: Make invalidate() a callback
  powerpc/spapr: vfio: Implement spapr_tce_iommu_ops
  powerpc/powernv: Convert/move set_bypass() callback to
take_ownership()
  powerpc/iommu: Fix IOMMU ownership control functions
  powerpc: Move tce_xxx callbacks from ppc_md to iommu_table
  powerpc/powernv: Release replaced TCE
  powerpc/pseries/lpar: Enable VFIO
  powerpc/powernv: Implement Dynamic DMA windows (DDW) for IODA
  vfio: powerpc/spapr: Reuse locked_vm accounting helpers
  vfio: powerpc/spapr: Use it_page_size
  vfio: powerpc/spapr: Enable Dynamic DMA windows

 arch/powerpc/include/asm/iommu.h|  33 ++-
 arch/powerpc/include/asm/kvm_host.h |   1 +
 arch/powerpc/include/asm/machdep.h  |  25 --
 arch/powerpc/include/asm/tce.h  |  38 +++
 arch/powerpc/kernel/iommu.c | 158 -
 arch/powerpc/kernel/vio.c   |   5 +-
 arch/powerpc/kvm/book3s.c   |   2 +-
 arch/powerpc/kvm/book3s_64_vio.c|  43 +++-
 arch/powerpc/kvm/book3s_64_vio_hv.c |   6 +-
 arch/powerpc/platforms/cell/iommu.c |   9 +-
 arch/powerpc/platforms/pasemi/iommu.c   |   8 +-
 arch/powerpc/platforms/powernv/pci-ioda.c   | 239 ---
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |   4 +-
 arch/powerpc/platforms/powernv/pci.c|  86 ---
 arch/powerpc/platforms/powernv/pci.h|  16 +-
 arch/powerpc/platforms/pseries/iommu.c  |  77 --
 arch/powerpc/sysdev/dart_iommu.c|  13 +-
 drivers/vfio/vfio_iommu_spapr_tce.c | 348 
 include/linux/mm.h  |   3 +
 include/linux/rculist.h |  38 +++
 include/uapi/linux/vfio.h   |  37 ++-
 mm/mlock.c  |  49 
 22 files changed, 990 insertions(+), 248 deletions(-)

-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 02/16] KVM: PPC: Use RCU for arch.spapr_tce_tables

2014-07-30 Thread Alexey Kardashevskiy
At the moment spapr_tce_tables is not protected against races. This makes
use of RCU-variants of list helpers. As some bits are executed in real
mode, this makes use of just introduced list_for_each_entry_rcu_notrace().

This converts release_spapr_tce_table() to a RCU scheduled handler.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
* total rework
* kfree() for kvmppc_spapr_tce_table is moved to call_rcu_sched() callback
* used new list_for_each_entry_rcu_notrace
---
 arch/powerpc/include/asm/kvm_host.h |  1 +
 arch/powerpc/kvm/book3s.c   |  2 +-
 arch/powerpc/kvm/book3s_64_vio.c| 23 +--
 arch/powerpc/kvm/book3s_64_vio_hv.c |  6 --
 4 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index bb66d8b..cd22c31 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -180,6 +180,7 @@ struct kvmppc_spapr_tce_table {
struct kvm *kvm;
u64 liobn;
u32 window_size;
+   struct rcu_head rcu;
struct page *pages[0];
 };
 
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index c254c27..9e17d19 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -886,7 +886,7 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 {
 
 #ifdef CONFIG_PPC64
-   INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
+   INIT_LIST_HEAD_RCU(&kvm->arch.spapr_tce_tables);
INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
 #endif
 
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 54cf9bc..5958f7d 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -45,19 +45,16 @@ static long kvmppc_stt_npages(unsigned long window_size)
 * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
-static void release_spapr_tce_table(struct kvmppc_spapr_tce_table *stt)
+static void release_spapr_tce_table(struct rcu_head *head)
 {
-   struct kvm *kvm = stt->kvm;
+   struct kvmppc_spapr_tce_table *stt = container_of(head,
+   struct kvmppc_spapr_tce_table, rcu);
int i;
 
-   mutex_lock(&kvm->lock);
-   list_del(&stt->list);
for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
__free_page(stt->pages[i]);
+   kvm_put_kvm(stt->kvm);
kfree(stt);
-   mutex_unlock(&kvm->lock);
-
-   kvm_put_kvm(kvm);
 }
 
 static int kvm_spapr_tce_fault(struct vm_area_struct *vma, struct vm_fault 
*vmf)
@@ -87,8 +84,13 @@ static int kvm_spapr_tce_mmap(struct file *file, struct 
vm_area_struct *vma)
 static int kvm_spapr_tce_release(struct inode *inode, struct file *filp)
 {
struct kvmppc_spapr_tce_table *stt = filp->private_data;
+   struct kvm *kvm = stt->kvm;
+
+   mutex_lock(&kvm->lock);
+   list_del_rcu(&stt->list);
+   call_rcu_sched(&stt->rcu, release_spapr_tce_table);
+   mutex_unlock(&kvm->lock);
 
-   release_spapr_tce_table(stt);
return 0;
 }
 
@@ -106,7 +108,8 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
int i;
 
/* Check this LIOBN hasn't been previously allocated */
-   list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
+   list_for_each_entry_rcu_notrace(stt, &kvm->arch.spapr_tce_tables,
+   list) {
if (stt->liobn == args->liobn)
return -EBUSY;
}
@@ -131,7 +134,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
kvm_get_kvm(kvm);
 
mutex_lock(&kvm->lock);
-   list_add(&stt->list, &kvm->arch.spapr_tce_tables);
+   list_add_rcu(&stt->list, &kvm->arch.spapr_tce_tables);
 
mutex_unlock(&kvm->lock);
 
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c 
b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 89e96b3..b1914d9 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -50,7 +50,8 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long 
liobn,
/* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
/*  liobn, ioba, tce); */
 
-   list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
+   list_for_each_entry_rcu_notrace(stt, &kvm->arch.spapr_tce_tables,
+   list) {
if (stt->liobn == liobn) {
unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
struct page *page;
@@ -82,7 +83,8 @@ long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long 
liobn,
struct kvm *kvm = vcpu->kvm;
struct kvmppc_spapr_tce_table *stt;
 
-   list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
+   list_for_each_entry_rcu_notrace(stt, &kvm->arch.spapr_tce_tables,
+   list) {
if (stt->liobn == liobn) {
unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
struc

[PATCH v4 04/16] KVM: PPC: Account TCE-containing pages in locked_vm

2014-07-30 Thread Alexey Kardashevskiy
At the moment pages used for TCE tables (not pages addressed by TCEs) are
not counter in locked_vm counter so a malicious userspace tool can
call ioctl(KVM_CREATE_SPAPR_TCE) as many times as RLIMIT_NOFILE and
lock a lot of memory.

This adds counting for pages used for TCE tables.

This counts the number of pages required for a table plus pages for
the kvmppc_spapr_tce_table struct (TCE table descriptor) itself.

This does not change the amount of (de)allocated memory.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v4:
* fixed counting for kvmppc_spapr_tce_table (used to be +1 page)
* added 2 helpers to common MM code for later reuse from vfio-spapr
---
 arch/powerpc/kvm/book3s_64_vio.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 5958f7d..b32aeb1 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -45,16 +45,33 @@ static long kvmppc_stt_npages(unsigned long window_size)
 * sizeof(u64), PAGE_SIZE) / PAGE_SIZE;
 }
 
+static long kvmppc_account_memlimit(long npages, bool inc)
+{
+   long stt_pages = ALIGN(sizeof(struct kvmppc_spapr_tce_table) +
+   (abs(npages) * sizeof(struct page *)), PAGE_SIZE);
+
+   npages += stt_pages;
+   if (inc)
+   return try_increment_locked_vm(npages);
+
+   decrement_locked_vm(npages);
+
+   return 0;
+}
+
 static void release_spapr_tce_table(struct rcu_head *head)
 {
struct kvmppc_spapr_tce_table *stt = container_of(head,
struct kvmppc_spapr_tce_table, rcu);
int i;
+   long npages = kvmppc_stt_npages(stt->window_size);
 
-   for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
+   for (i = 0; i < npages; i++)
__free_page(stt->pages[i]);
kvm_put_kvm(stt->kvm);
kfree(stt);
+
+   kvmppc_account_memlimit(npages, false);
 }
 
 static int kvm_spapr_tce_fault(struct vm_area_struct *vma, struct vm_fault 
*vmf)
@@ -115,6 +132,9 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
}
 
npages = kvmppc_stt_npages(args->window_size);
+   ret = kvmppc_account_memlimit(npages, true);
+   if (ret)
+   goto fail;
 
stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *),
  GFP_KERNEL);
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 03/16] mm: Add helpers for locked_vm

2014-07-30 Thread Alexey Kardashevskiy
This adds 2 helpers to change the locked_vm counter:
- try_increase_locked_vm - may fail if new locked_vm value will be greater
than the RLIMIT_MEMLOCK limit;
- decrease_locked_vm.

These will be used by drivers capable of locking memory by userspace
request. For example, VFIO can use it to check if it can lock DMA memory
or PPC-KVM can use it to check if it can lock memory for TCE tables.

Signed-off-by: Alexey Kardashevskiy 
---
 include/linux/mm.h |  3 +++
 mm/mlock.c | 49 +
 2 files changed, 52 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e03dd29..1cb219d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2113,5 +2113,8 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+extern long try_increment_locked_vm(long npages);
+extern void decrement_locked_vm(long npages);
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/mlock.c b/mm/mlock.c
index b1eb536..39e4b55 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -864,3 +864,52 @@ void user_shm_unlock(size_t size, struct user_struct *user)
spin_unlock(&shmlock_user_lock);
free_uid(user);
 }
+
+/**
+ * try_increment_locked_vm() - checks if new locked_vm value is going to
+ * be less than RLIMIT_MEMLOCK and increments it by npages if it is.
+ *
+ * @npages: the number of pages to add to locked_vm.
+ *
+ * Returns 0 if succeeded or negative value if failed.
+ */
+long try_increment_locked_vm(long npages)
+{
+   long ret = 0, locked, lock_limit;
+
+   if (!current || !current->mm)
+   return -ESRCH; /* process exited */
+
+   down_write(¤t->mm->mmap_sem);
+   locked = current->mm->locked_vm + npages;
+   lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+   if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
+   pr_warn("RLIMIT_MEMLOCK (%ld) exceeded\n",
+   rlimit(RLIMIT_MEMLOCK));
+   ret = -ENOMEM;
+   } else {
+   current->mm->locked_vm += npages;
+   }
+   up_write(¤t->mm->mmap_sem);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(try_increment_locked_vm);
+
+/**
+ * decrement_locked_vm() - decrements the current task's locked_vm counter.
+ *
+ * @npages: the number to decrement by.
+ */
+void decrement_locked_vm(long npages)
+{
+   if (!current || !current->mm)
+   return; /* process exited */
+
+   down_write(¤t->mm->mmap_sem);
+   if (npages > current->mm->locked_vm)
+   npages = current->mm->locked_vm;
+   current->mm->locked_vm -= npages;
+   up_write(¤t->mm->mmap_sem);
+}
+EXPORT_SYMBOL_GPL(decrement_locked_vm);
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 05/16] powerpc/iommu: Fix comments with it_page_shift

2014-07-30 Thread Alexey Kardashevskiy
There is a couple of commented debug prints which still use
IOMMU_PAGE_SHIFT() which is not defined for POWERPC anymore, replace
them with it_page_shift.

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: Gavin Shan 
---
 arch/powerpc/kernel/iommu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 88e3ec6..f84f799 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1037,7 +1037,7 @@ int iommu_tce_build(struct iommu_table *tbl, unsigned 
long entry,
 
/* if (unlikely(ret))
pr_err("iommu_tce: %s failed on hwaddr=%lx ioba=%lx kva=%lx 
ret=%d\n",
-   __func__, hwaddr, entry << IOMMU_PAGE_SHIFT(tbl),
+   __func__, hwaddr, entry << tbl->it_page_shift,
hwaddr, ret); */
 
return ret;
@@ -1056,7 +1056,7 @@ int iommu_put_tce_user_mode(struct iommu_table *tbl, 
unsigned long entry,
direction != DMA_TO_DEVICE, &page);
if (unlikely(ret != 1)) {
/* pr_err("iommu_tce: get_user_pages_fast failed tce=%lx 
ioba=%lx ret=%d\n",
-   tce, entry << IOMMU_PAGE_SHIFT(tbl), ret); */
+   tce, entry << tbl->it_page_shift, ret); */
return -EFAULT;
}
hwaddr = (unsigned long) page_address(page) + offset;
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 08/16] powerpc/powernv: Convert/move set_bypass() callback to take_ownership()

2014-07-30 Thread Alexey Kardashevskiy
At the moment the iommu_table struct has a set_bypass() which enables/
disables DMA bypass on IODA2 PHB. This is exposed to POWERPC IOMMU code
which calls this callback when external IOMMU users such as VFIO are
about to get over a PHB.

Since the set_bypass() is not really an iommu_table function but PE's
function, and we have an ops struct per IOMMU owner, let's move
set_bypass() to the spapr_tce_iommu_ops struct.

As arch/powerpc/kernel/iommu.c is more about POWERPC IOMMU tables and
has very little to do with PEs, this moves take_ownership() calls to
the VFIO SPAPR TCE driver.

This renames set_bypass() to take_ownership() as it is not necessarily
just enabling bypassing, it can be something else/more so let's give it
a generic name. The bool parameter is inverted.

Signed-off-by: Alexey Kardashevskiy 
Reviewed-by: Gavin Shan 
---
 arch/powerpc/include/asm/iommu.h  |  1 -
 arch/powerpc/include/asm/tce.h|  2 ++
 arch/powerpc/kernel/iommu.c   | 12 
 arch/powerpc/platforms/powernv/pci-ioda.c | 18 +++---
 drivers/vfio/vfio_iommu_spapr_tce.c   | 17 +
 5 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 84ee339..2b0b01d 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -77,7 +77,6 @@ struct iommu_table {
 #ifdef CONFIG_IOMMU_API
struct iommu_group *it_group;
 #endif
-   void (*set_bypass)(struct iommu_table *tbl, bool enable);
 };
 
 /* Pure 2^n version of get_order */
diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
index 8bfe98f..5ee4987 100644
--- a/arch/powerpc/include/asm/tce.h
+++ b/arch/powerpc/include/asm/tce.h
@@ -58,6 +58,8 @@ struct spapr_tce_iommu_ops {
struct iommu_table *(*get_table)(
struct spapr_tce_iommu_group *data,
phys_addr_t addr);
+   void (*take_ownership)(struct spapr_tce_iommu_group *data,
+   bool enable);
 };
 
 struct spapr_tce_iommu_group {
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index e203314..06984d5 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1116,14 +1116,6 @@ int iommu_take_ownership(struct iommu_table *tbl)
memset(tbl->it_map, 0xff, sz);
iommu_clear_tces_and_put_pages(tbl, tbl->it_offset, tbl->it_size);
 
-   /*
-* Disable iommu bypass, otherwise the user can DMA to all of
-* our physical memory via the bypass window instead of just
-* the pages that has been explicitly mapped into the iommu
-*/
-   if (tbl->set_bypass)
-   tbl->set_bypass(tbl, false);
-
return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_take_ownership);
@@ -1138,10 +1130,6 @@ void iommu_release_ownership(struct iommu_table *tbl)
/* Restore bit#0 set by iommu_init_table() */
if (tbl->it_offset == 0)
set_bit(0, tbl->it_map);
-
-   /* The kernel owns the device now, we can restore the iommu bypass */
-   if (tbl->set_bypass)
-   tbl->set_bypass(tbl, true);
 }
 EXPORT_SYMBOL_GPL(iommu_release_ownership);
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 495137b..f828c57 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -709,10 +709,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
 }
 
-static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
+static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
 {
-   struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
- tce32.table);
uint16_t window_id = (pe->pe_number << 1 ) + 1;
int64_t rc;
 
@@ -752,15 +750,21 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct pnv_phb 
*phb,
/* TVE #1 is selected by PCI address bit 59 */
pe->tce_bypass_base = 1ull << 59;
 
-   /* Install set_bypass callback for VFIO */
-   pe->tce32.table.set_bypass = pnv_pci_ioda2_set_bypass;
-
/* Enable bypass by default */
-   pnv_pci_ioda2_set_bypass(&pe->tce32.table, true);
+   pnv_pci_ioda2_set_bypass(pe, true);
+}
+
+static void pnv_ioda2_take_ownership(struct spapr_tce_iommu_group *data,
+bool enable)
+{
+   struct pnv_ioda_pe *pe = data->iommu_owner;
+
+   pnv_pci_ioda2_set_bypass(pe, !enable);
 }
 
 static struct spapr_tce_iommu_ops pnv_pci_ioda2_ops = {
.get_table = pnv_ioda1_iommu_get_table,
+   .take_ownership = pnv_ioda2_take_ownership,
 };
 
 static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index

[PATCH v4 06/16] powerpc/powernv: Make invalidate() a callback

2014-07-30 Thread Alexey Kardashevskiy
At the moment pnv_pci_ioda_tce_invalidate() gets the PE pointer via
container_of(tbl). Since we are going to have to add Dynamic DMA windows
and that means having 2 IOMMU tables per PE, this is not going to work.

This implements pnv_pci_ioda(1|2)_tce_invalidate as a pnv_ioda_pe callback.

This adds a pnv_iommu_table wrapper around iommu_table and stores a pointer
to PE there. PNV's ppc_md.tce_build() call uses this to find PE and
do the invalidation. This will be used later for Dynamic DMA windows too.

This registers invalidate() callbacks for IODA1 and IODA2:
- pnv_pci_ioda1_tce_invalidate;
- pnv_pci_ioda2_tce_invalidate.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v4:
* changed commit log to explain why this change is needed
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 33 +++
 arch/powerpc/platforms/powernv/pci.c  | 31 +
 arch/powerpc/platforms/powernv/pci.h  | 13 +++-
 3 files changed, 47 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 9f28e18..007497f 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -462,7 +462,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, 
struct pci_dev *pdev
 
pe = &phb->ioda.pe_array[pdn->pe_number];
WARN_ON(get_dma_ops(&pdev->dev) != &dma_iommu_ops);
-   set_iommu_table_base(&pdev->dev, &pe->tce32_table);
+   set_iommu_table_base(&pdev->dev, &pe->tce32.table);
 }
 
 static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
@@ -489,7 +489,7 @@ static int pnv_pci_ioda_dma_set_mask(struct pnv_phb *phb,
} else {
dev_info(&pdev->dev, "Using 32-bit DMA via iommu\n");
set_dma_ops(&pdev->dev, &dma_iommu_ops);
-   set_iommu_table_base(&pdev->dev, &pe->tce32_table);
+   set_iommu_table_base(&pdev->dev, &pe->tce32.table);
}
return 0;
 }
@@ -499,7 +499,7 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, 
struct pci_bus *bus)
struct pci_dev *dev;
 
list_for_each_entry(dev, &bus->devices, bus_list) {
-   set_iommu_table_base_and_group(&dev->dev, &pe->tce32_table);
+   set_iommu_table_base_and_group(&dev->dev, &pe->tce32.table);
if (dev->subordinate)
pnv_ioda_setup_bus_dma(pe, dev->subordinate);
}
@@ -584,19 +584,6 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
pnv_ioda_pe *pe,
}
 }
 
-void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
-__be64 *startp, __be64 *endp, bool rm)
-{
-   struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
- tce32_table);
-   struct pnv_phb *phb = pe->phb;
-
-   if (phb->type == PNV_PHB_IODA1)
-   pnv_pci_ioda1_tce_invalidate(pe, tbl, startp, endp, rm);
-   else
-   pnv_pci_ioda2_tce_invalidate(pe, tbl, startp, endp, rm);
-}
-
 static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
  struct pnv_ioda_pe *pe, unsigned int base,
  unsigned int segs)
@@ -654,9 +641,11 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
}
 
/* Setup linux iommu table */
-   tbl = &pe->tce32_table;
+   tbl = &pe->tce32.table;
pnv_pci_setup_iommu_table(tbl, addr, TCE32_TABLE_SIZE * segs,
  base << 28, IOMMU_PAGE_SHIFT_4K);
+   pe->tce32.pe = pe;
+   pe->tce_invalidate = pnv_pci_ioda1_tce_invalidate;
 
/* OPAL variant of P7IOC SW invalidated TCEs */
swinvp = of_get_property(phb->hose->dn, "ibm,opal-tce-kill", NULL);
@@ -693,7 +682,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 static void pnv_pci_ioda2_set_bypass(struct iommu_table *tbl, bool enable)
 {
struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
- tce32_table);
+ tce32.table);
uint16_t window_id = (pe->pe_number << 1 ) + 1;
int64_t rc;
 
@@ -734,10 +723,10 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct pnv_phb 
*phb,
pe->tce_bypass_base = 1ull << 59;
 
/* Install set_bypass callback for VFIO */
-   pe->tce32_table.set_bypass = pnv_pci_ioda2_set_bypass;
+   pe->tce32.table.set_bypass = pnv_pci_ioda2_set_bypass;
 
/* Enable bypass by default */
-   pnv_pci_ioda2_set_bypass(&pe->tce32_table, true);
+   pnv_pci_ioda2_set_bypass(&pe->tce32.table, true);
 }
 
 static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
@@ -785,9 +774,11 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
}
 
/* Setup linux iommu table */
-   tbl = &pe->tce32_table;
+   tbl = &pe-

[PATCH v4 09/16] powerpc/iommu: Fix IOMMU ownership control functions

2014-07-30 Thread Alexey Kardashevskiy
This adds missing locks in iommu_take_ownership()/
iommu_release_ownership().

This marks all pages busy in iommu_table::it_map in order to catch
errors if there is an attempt to use this table while ownership over it
is taken.

This only clears TCE content if there is no page marked busy in it_map.
Clearing must be done outside of the table locks as iommu_clear_tce()
called from iommu_clear_tces_and_put_pages() does this.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/kernel/iommu.c | 36 +---
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 06984d5..c94b11d 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1103,33 +1103,55 @@ EXPORT_SYMBOL_GPL(iommu_put_tce_user_mode);
 
 int iommu_take_ownership(struct iommu_table *tbl)
 {
-   unsigned long sz = (tbl->it_size + 7) >> 3;
+   unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
+   int ret = 0, bit0 = 0;
+
+   spin_lock_irqsave(&tbl->large_pool.lock, flags);
+   for (i = 0; i < tbl->nr_pools; i++)
+   spin_lock(&tbl->pools[i].lock);
 
if (tbl->it_offset == 0)
-   clear_bit(0, tbl->it_map);
+   bit0 = test_and_clear_bit(0, tbl->it_map);
 
if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
pr_err("iommu_tce: it_map is not empty");
-   return -EBUSY;
+   ret = -EBUSY;
+   if (bit0)
+   set_bit(0, tbl->it_map);
+   } else {
+   memset(tbl->it_map, 0xff, sz);
}
 
-   memset(tbl->it_map, 0xff, sz);
-   iommu_clear_tces_and_put_pages(tbl, tbl->it_offset, tbl->it_size);
+   for (i = 0; i < tbl->nr_pools; i++)
+   spin_unlock(&tbl->pools[i].lock);
+   spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
 
-   return 0;
+   if (!ret)
+   iommu_clear_tces_and_put_pages(tbl, tbl->it_offset,
+   tbl->it_size);
+   return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_take_ownership);
 
 void iommu_release_ownership(struct iommu_table *tbl)
 {
-   unsigned long sz = (tbl->it_size + 7) >> 3;
+   unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
 
iommu_clear_tces_and_put_pages(tbl, tbl->it_offset, tbl->it_size);
+
+   spin_lock_irqsave(&tbl->large_pool.lock, flags);
+   for (i = 0; i < tbl->nr_pools; i++)
+   spin_lock(&tbl->pools[i].lock);
+
memset(tbl->it_map, 0, sz);
 
/* Restore bit#0 set by iommu_init_table() */
if (tbl->it_offset == 0)
set_bit(0, tbl->it_map);
+
+   for (i = 0; i < tbl->nr_pools; i++)
+   spin_unlock(&tbl->pools[i].lock);
+   spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
 }
 EXPORT_SYMBOL_GPL(iommu_release_ownership);
 
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 12/16] powerpc/pseries/lpar: Enable VFIO

2014-07-30 Thread Alexey Kardashevskiy
The previous patch introduced iommu_table_ops::set_and_get() callback
which effectively disabled VFIO on pseries. This implements set_and_get()
for pseries/lpar so VFIO can work under pHyp again.

Since set_and_get() callback must return old TCE, it has to do H_GET_TCE
for every TCE being replaced, therefore VFIO's performance under pHyp
is expected to be slow.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/platforms/pseries/iommu.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 793f002..d3cded1 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -138,13 +138,14 @@ static void tce_freemulti_pSeriesLP(struct iommu_table*, 
long, long);
 
 static int tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum,
long npages, unsigned long uaddr,
+   unsigned long *old_tces,
enum dma_data_direction direction,
struct dma_attrs *attrs)
 {
u64 rc = 0;
u64 proto_tce, tce;
u64 rpn;
-   int ret = 0;
+   int ret = 0, i = 0;
long tcenum_start = tcenum, npages_start = npages;
 
rpn = __pa(uaddr) >> TCE_SHIFT;
@@ -154,6 +155,9 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, 
long tcenum,
 
while (npages--) {
tce = proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT;
+   if (old_tces)
+   plpar_tce_get((u64)tbl->it_index, (u64)tcenum << 12,
+   &old_tces[i++]);
rc = plpar_tce_put((u64)tbl->it_index, (u64)tcenum << 12, tce);
 
if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
@@ -179,8 +183,9 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, 
long tcenum,
 
 static DEFINE_PER_CPU(__be64 *, tce_page);
 
-static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
+static int tce_set_and_get_pSeriesLP(struct iommu_table *tbl, long tcenum,
 long npages, unsigned long uaddr,
+unsigned long *old_tces,
 enum dma_data_direction direction,
 struct dma_attrs *attrs)
 {
@@ -195,6 +200,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
 
if ((npages == 1) || !firmware_has_feature(FW_FEATURE_MULTITCE)) {
return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
+  old_tces,
   direction, attrs);
}
 
@@ -211,6 +217,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
if (!tcep) {
local_irq_restore(flags);
return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
+   old_tces,
direction, attrs);
}
__get_cpu_var(tce_page) = tcep;
@@ -232,6 +239,10 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
for (l = 0; l < limit; l++) {
tcep[l] = cpu_to_be64(proto_tce | (rpn & TCE_RPN_MASK) 
<< TCE_RPN_SHIFT);
rpn++;
+   if (old_tces)
+   plpar_tce_get((u64)tbl->it_index,
+   (u64)(tcenum + l) << 12,
+   &old_tces[tcenum + l]);
}
 
rc = plpar_tce_put_indirect((u64)tbl->it_index,
@@ -262,6 +273,15 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
return ret;
 }
 
+static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
+long npages, unsigned long uaddr,
+enum dma_data_direction direction,
+struct dma_attrs *attrs)
+{
+   return tce_set_and_get_pSeriesLP(tbl, tcenum, npages, uaddr, NULL,
+   direction, attrs);
+}
+
 static void tce_free_pSeriesLP(struct iommu_table *tbl, long tcenum, long 
npages)
 {
u64 rc;
@@ -637,6 +657,7 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
 
 struct iommu_table_ops iommu_table_lpar_multi_ops = {
.set = tce_buildmulti_pSeriesLP,
+   .set_and_get = tce_set_and_get_pSeriesLP,
.clear = tce_freemulti_pSeriesLP,
.get = tce_get_pSeriesLP
 };
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 11/16] powerpc/powernv: Release replaced TCE

2014-07-30 Thread Alexey Kardashevskiy
At the moment writing new TCE value to the IOMMU table fails with EBUSY
if there is a valid entry already. However PAPR specification allows
the guest to write new TCE value without clearing it first.

This adds a set_and_get() callback to iommu_table_ops which does the same
thing as set() plus it returns replaced TCE(s) so the caller can release
the pages afterwards.

This makes iommu_tce_build() put pages returned by set_and_get().

Since now we depend on permission bits in TCE entries, this preserves
those bits in TCE in iommu_put_tce_user_mode().

This removes use of pool locks as those locks serve for TCE allocations
rathen than IOMMU table access and new set_and_get() callback provides
lockless way of safe pages release.

This disables external IOMMU use (i.e. VFIO) for IOMMUs which do not
implement set_and_get() callback. Therefore the "powernv" platform is
the only supported one.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v4:
* this is merge+rework of
powerpc/powernv: Return non-zero TCE from pnv_tce_build
powerpc/iommu: Implement put_page() if TCE had non-zero value
powerpc/iommu: Extend ppc_md.tce_build(_rm) to return old TCE values
---
 arch/powerpc/include/asm/iommu.h |  6 ++
 arch/powerpc/kernel/iommu.c  | 28 +++-
 arch/powerpc/platforms/powernv/pci.c | 29 +++--
 3 files changed, 44 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index c725e4a..4b13e4e 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -49,6 +49,12 @@ struct iommu_table_ops {
unsigned long uaddr,
enum dma_data_direction direction,
struct dma_attrs *attrs);
+   int (*set_and_get)(struct iommu_table *tbl,
+   long index, long npages,
+   unsigned long uaddr,
+   unsigned long *old_tces,
+   enum dma_data_direction direction,
+   struct dma_attrs *attrs);
void (*clear)(struct iommu_table *tbl,
long index, long npages);
unsigned long (*get)(struct iommu_table *tbl, long index);
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 6a86788..ad52e00 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1007,9 +1007,6 @@ EXPORT_SYMBOL_GPL(iommu_tce_put_param_check);
 unsigned long iommu_clear_tce(struct iommu_table *tbl, unsigned long entry)
 {
unsigned long oldtce;
-   struct iommu_pool *pool = get_pool(tbl, entry);
-
-   spin_lock(&(pool->lock));
 
oldtce = tbl->it_ops->get(tbl, entry);
if (oldtce & (TCE_PCI_WRITE | TCE_PCI_READ))
@@ -1017,8 +1014,6 @@ unsigned long iommu_clear_tce(struct iommu_table *tbl, 
unsigned long entry)
else
oldtce = 0;
 
-   spin_unlock(&(pool->lock));
-
return oldtce;
 }
 EXPORT_SYMBOL_GPL(iommu_clear_tce);
@@ -1056,16 +1051,12 @@ int iommu_tce_build(struct iommu_table *tbl, unsigned 
long entry,
 {
int ret = -EBUSY;
unsigned long oldtce;
-   struct iommu_pool *pool = get_pool(tbl, entry);
 
-   spin_lock(&(pool->lock));
+   ret = tbl->it_ops->set_and_get(tbl, entry, 1, hwaddr, &oldtce,
+   direction, NULL);
 
-   oldtce = tbl->it_ops->get(tbl, entry);
-   /* Add new entry if it is not busy */
-   if (!(oldtce & (TCE_PCI_WRITE | TCE_PCI_READ)))
-   ret = tbl->it_ops->set(tbl, entry, 1, hwaddr, direction, NULL);
-
-   spin_unlock(&(pool->lock));
+   if (oldtce & (TCE_PCI_WRITE | TCE_PCI_READ))
+   put_page(pfn_to_page(__pa(oldtce) >> PAGE_SHIFT));
 
/* if (unlikely(ret))
pr_err("iommu_tce: %s failed on hwaddr=%lx ioba=%lx kva=%lx 
ret=%d\n",
@@ -1092,6 +1083,7 @@ int iommu_put_tce_user_mode(struct iommu_table *tbl, 
unsigned long entry,
return -EFAULT;
}
hwaddr = (unsigned long) page_address(page) + offset;
+   hwaddr |= tce & (TCE_PCI_READ | TCE_PCI_WRITE);
 
ret = iommu_tce_build(tbl, entry, hwaddr, direction);
if (ret)
@@ -1110,6 +1102,16 @@ int iommu_take_ownership(struct iommu_table *tbl)
unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
int ret = 0, bit0 = 0;
 
+   /*
+* VFIO does not control TCE entries allocation and the guest
+* can write new TCEs on top of existing ones so iommu_tce_build()
+* must be able to release old pages. This functionality
+* requires set_and_get() callback defined so if it is not
+* implemented, we disallow taking ownership over the table.
+*/
+   if (!tbl->it_ops->set_and_get)
+   return -EINVAL;
+
spin_lock_irqsave(&tbl->large_pool.lock, flags);
for (i = 0; i < tbl->nr_pools; i++)
  

[PATCH v4 10/16] powerpc: Move tce_xxx callbacks from ppc_md to iommu_table

2014-07-30 Thread Alexey Kardashevskiy
This adds a iommu_table_ops struct and puts pointer to it into
the iommu_table struct. This moves tce_build/tce_free/tce_get/tce_flush
callbacks from ppc_md to the new struct where they really belong to.

This adds an extra @ops parameter to iommu_init_table() to make sure
that we do not leave any IOMMU table without iommu_table_ops. @it_ops is
initialized in the very beginning as iommu_init_table() calls
iommu_table_clear() and the latter uses callbacks already.

This does s/tce_build/set/, s/tce_free/clear/ and removes "tce_" prefixes
for better readability.

This removes tce_xxx_rm handlers from ppc_md as well but does not add
them to iommu_table_ops, this will be done later if we decide to support
TCE hypercalls in real mode.

This always uses tce_buildmulti_pSeriesLP/tce_buildmulti_pSeriesLP as
callbacks for pseries. This changes "multi" callbacks to fall back to
tce_build_pSeriesLP/tce_free_pSeriesLP if FW_FEATURE_MULTITCE is not
present. The reason for this is we still have to support "multitce=off"
boot parameter in disable_multitce() and we do not want to walk through
all IOMMU tables in the system and replace "multi" callbacks with single
ones.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h| 20 +++-
 arch/powerpc/include/asm/machdep.h  | 25 ---
 arch/powerpc/kernel/iommu.c | 50 -
 arch/powerpc/kernel/vio.c   |  5 ++-
 arch/powerpc/platforms/cell/iommu.c |  9 --
 arch/powerpc/platforms/pasemi/iommu.c   |  8 +++--
 arch/powerpc/platforms/powernv/pci-ioda.c   |  4 +--
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |  3 +-
 arch/powerpc/platforms/powernv/pci.c| 24 --
 arch/powerpc/platforms/powernv/pci.h|  1 +
 arch/powerpc/platforms/pseries/iommu.c  | 42 +---
 arch/powerpc/sysdev/dart_iommu.c| 13 
 12 files changed, 102 insertions(+), 102 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 2b0b01d..c725e4a 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -43,6 +43,22 @@
 extern int iommu_is_off;
 extern int iommu_force_on;
 
+struct iommu_table_ops {
+   int (*set)(struct iommu_table *tbl,
+   long index, long npages,
+   unsigned long uaddr,
+   enum dma_data_direction direction,
+   struct dma_attrs *attrs);
+   void (*clear)(struct iommu_table *tbl,
+   long index, long npages);
+   unsigned long (*get)(struct iommu_table *tbl, long index);
+   void (*flush)(struct iommu_table *tbl);
+};
+
+/* These are used by VIO */
+extern struct iommu_table_ops iommu_table_lpar_multi_ops;
+extern struct iommu_table_ops iommu_table_pseries_ops;
+
 /*
  * IOMAP_MAX_ORDER defines the largest contiguous block
  * of dma space we can get.  IOMAP_MAX_ORDER = 13
@@ -77,6 +93,7 @@ struct iommu_table {
 #ifdef CONFIG_IOMMU_API
struct iommu_group *it_group;
 #endif
+   struct iommu_table_ops *it_ops;
 };
 
 /* Pure 2^n version of get_order */
@@ -106,7 +123,8 @@ extern void iommu_free_table(struct iommu_table *tbl, const 
char *node_name);
  * structure
  */
 extern struct iommu_table *iommu_init_table(struct iommu_table * tbl,
-   int nid);
+   int nid,
+   struct iommu_table_ops *ops);
 
 struct spapr_tce_iommu_ops;
 #ifdef CONFIG_IOMMU_API
diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index f92b0b5..0a2ec04 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -65,31 +65,6 @@ struct machdep_calls {
 * destroyed as well */
void(*hpte_clear_all)(void);
 
-   int (*tce_build)(struct iommu_table *tbl,
-long index,
-long npages,
-unsigned long uaddr,
-enum dma_data_direction direction,
-struct dma_attrs *attrs);
-   void(*tce_free)(struct iommu_table *tbl,
-   long index,
-   long npages);
-   unsigned long   (*tce_get)(struct iommu_table *tbl,
-   long index);
-   void(*tce_flush)(struct iommu_table *tbl);
-
-   /* _rm versions are for real mode use only */
-   int (*tce_build_rm)(struct iommu_table *tbl,
-long index,
-long npages,
-unsigned long uaddr,
-enum dma_data_direction direction,
-   

[PATCH v4 14/16] vfio: powerpc/spapr: Reuse locked_vm accounting helpers

2014-07-30 Thread Alexey Kardashevskiy
There are helpers to account locked pages in locked_vm counter, this
reuses these helpers in VFIO-SPAPR-IOMMU driver.

While we are here, update the comment explaining why RLIMIT_MEMLOCK
might be required to be bigger than entire guest RAM.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v4:
* added comment explaining how big the ulimit should be
* used try_increment_locked_vm/decrement_locked_vm
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 33 +++--
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index d9845af..6ed0fc3 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -58,7 +58,6 @@ static void tce_iommu_take_ownership_notify(struct 
spapr_tce_iommu_group *data,
 static int tce_iommu_enable(struct tce_container *container)
 {
int ret = 0;
-   unsigned long locked, lock_limit, npages;
struct iommu_table *tbl;
struct spapr_tce_iommu_group *data;
 
@@ -92,24 +91,24 @@ static int tce_iommu_enable(struct tce_container *container)
 * Also we don't have a nice way to fail on H_PUT_TCE due to ulimits,
 * that would effectively kill the guest at random points, much better
 * enforcing the limit based on the max that the guest can map.
+*
+* Unfortunately at the moment it counts whole tables, no matter how
+* much memory the guest has. I.e. for 4GB guest and 4 IOMMU groups
+* each with 2GB DMA window, 8GB will be counted here. The reason for
+* this is that we cannot tell here the amount of RAM used by the guest
+* as this information is only available from KVM and VFIO is
+* KVM agnostic.
 */
tbl = data->ops->get_table(data, TCE_DEFAULT_WINDOW);
if (!tbl)
return -ENXIO;
 
-   down_write(¤t->mm->mmap_sem);
-   npages = (tbl->it_size << IOMMU_PAGE_SHIFT_4K) >> PAGE_SHIFT;
-   locked = current->mm->locked_vm + npages;
-   lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-   if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
-   pr_warn("RLIMIT_MEMLOCK (%ld) exceeded\n",
-   rlimit(RLIMIT_MEMLOCK));
-   ret = -ENOMEM;
-   } else {
-   current->mm->locked_vm += npages;
-   container->enabled = true;
-   }
-   up_write(¤t->mm->mmap_sem);
+   ret = try_increment_locked_vm((tbl->it_size << IOMMU_PAGE_SHIFT_4K) >>
+   PAGE_SHIFT);
+   if (ret)
+   return ret;
+
+   container->enabled = true;
 
return ret;
 }
@@ -135,10 +134,8 @@ static void tce_iommu_disable(struct tce_container 
*container)
if (!tbl)
return;
 
-   down_write(¤t->mm->mmap_sem);
-   current->mm->locked_vm -= (tbl->it_size <<
-   IOMMU_PAGE_SHIFT_4K) >> PAGE_SHIFT;
-   up_write(¤t->mm->mmap_sem);
+   decrement_locked_vm((tbl->it_size << IOMMU_PAGE_SHIFT_4K) >>
+   PAGE_SHIFT);
 }
 
 static void *tce_iommu_open(unsigned long arg)
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 13/16] powerpc/powernv: Implement Dynamic DMA windows (DDW) for IODA

2014-07-30 Thread Alexey Kardashevskiy
SPAPR defines an interface to create additional DMA windows dynamically.
"Dynamically" means that the window is not allocated at the guest start
and the guest can request it later. In practice, existing linux guests
check for the capability and if it is there, they create+map one big DMA
window as big as the entire guest RAM.

SPAPR defines 4 RTAS calls for this feature which userspace implements.
This adds 4 callbacks into the spapr_tce_iommu_ops struct:
1. query - ibm,query-pe-dma-window - returns number/size of windows
which can be created (one, any page size);
2. create - ibm,create-pe-dma-window - creates a window;
3. remove - ibm,remove-pe-dma-window - removes a window; only additional
window created by create() can be removed, the default 32bit window cannot
be removed as guests do not expect new windows to start from zero;
4. reset -  ibm,reset-pe-dma-window - reset the DMA windows configuration
to the default state; now it only removes the additional window if it
was created.

The next patch will add corresponding ioctls to VFIO SPAPR TCE driver to
pass RTAS call from the userspace to the IODA code.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/tce.h|  21 
 arch/powerpc/platforms/powernv/pci-ioda.c | 158 +-
 arch/powerpc/platforms/powernv/pci.h  |   2 +
 3 files changed, 180 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
index 5ee4987..583463b 100644
--- a/arch/powerpc/include/asm/tce.h
+++ b/arch/powerpc/include/asm/tce.h
@@ -60,6 +60,27 @@ struct spapr_tce_iommu_ops {
phys_addr_t addr);
void (*take_ownership)(struct spapr_tce_iommu_group *data,
bool enable);
+
+   /* Dynamic DMA window */
+   /* Page size flags for ibm,query-pe-dma-window */
+#define DDW_PGSIZE_4K   0x01
+#define DDW_PGSIZE_64K  0x02
+#define DDW_PGSIZE_16M  0x04
+#define DDW_PGSIZE_32M  0x08
+#define DDW_PGSIZE_64M  0x10
+#define DDW_PGSIZE_128M 0x20
+#define DDW_PGSIZE_256M 0x40
+#define DDW_PGSIZE_16G  0x80
+   long (*query)(struct spapr_tce_iommu_group *data,
+   __u32 *windows_available,
+   __u32 *page_size_mask);
+   long (*create)(struct spapr_tce_iommu_group *data,
+   __u32 page_shift,
+   __u32 window_shift,
+   struct iommu_table **ptbl);
+   long (*remove)(struct spapr_tce_iommu_group *data,
+   struct iommu_table *tbl);
+   long (*reset)(struct spapr_tce_iommu_group *data);
 };
 
 struct spapr_tce_iommu_group {
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7482518..6a847b2 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -754,6 +754,24 @@ static void pnv_pci_ioda2_setup_bypass_pe(struct pnv_phb 
*phb,
pnv_pci_ioda2_set_bypass(pe, true);
 }
 
+static struct iommu_table *pnv_ioda2_iommu_get_table(
+   struct spapr_tce_iommu_group *data,
+   phys_addr_t addr)
+{
+   struct pnv_ioda_pe *pe = data->iommu_owner;
+
+   if (addr == TCE_DEFAULT_WINDOW)
+   return &pe->tce32.table;
+
+   if (pnv_pci_ioda_check_addr(&pe->tce64.table, addr))
+   return &pe->tce64.table;
+
+   if (pnv_pci_ioda_check_addr(&pe->tce32.table, addr))
+   return &pe->tce32.table;
+
+   return NULL;
+}
+
 static void pnv_ioda2_take_ownership(struct spapr_tce_iommu_group *data,
 bool enable)
 {
@@ -762,9 +780,147 @@ static void pnv_ioda2_take_ownership(struct 
spapr_tce_iommu_group *data,
pnv_pci_ioda2_set_bypass(pe, !enable);
 }
 
+static long pnv_pci_ioda2_ddw_query(struct spapr_tce_iommu_group *data,
+   __u32 *windows_available, __u32 *page_size_mask)
+{
+   struct pnv_ioda_pe *pe = data->iommu_owner;
+
+   if (pe->tce64_active) {
+   *page_size_mask = 0;
+   *windows_available = 0;
+   } else {
+   *page_size_mask =
+   DDW_PGSIZE_4K |
+   DDW_PGSIZE_64K |
+   DDW_PGSIZE_16M;
+   *windows_available = 1;
+   }
+
+   return 0;
+}
+
+static long pnv_pci_ioda2_ddw_create(struct spapr_tce_iommu_group *data,
+   __u32 page_shift, __u32 window_shift,
+   struct iommu_table **ptbl)
+{
+   struct pnv_ioda_pe *pe = data->iommu_owner;
+   struct pnv_phb *phb = pe->phb;
+   struct page *tce_mem = NULL;
+   void *addr;
+   long ret;
+   unsigned long tce_table_size =
+   (1ULL << (window_shift - page_shift)) * 8;
+   unsigned order;
+   struct iommu_table *tbl64 = &pe->tce64.table;
+
+   if ((page_shift != 12) && (page_shift != 16) && (page_shif

[PATCH v4 15/16] vfio: powerpc/spapr: Use it_page_size

2014-07-30 Thread Alexey Kardashevskiy
This makes use of the it_page_size from the iommu_table struct
as page size can differ.

This replaces missing IOMMU_PAGE_SHIFT macro in commented debug code
as recently introduced IOMMU_PAGE_XXX macros do not include
IOMMU_PAGE_SHIFT.

Signed-off-by: Alexey Kardashevskiy 
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index 6ed0fc3..48b256c 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -103,7 +103,7 @@ static int tce_iommu_enable(struct tce_container *container)
if (!tbl)
return -ENXIO;
 
-   ret = try_increment_locked_vm((tbl->it_size << IOMMU_PAGE_SHIFT_4K) >>
+   ret = try_increment_locked_vm((tbl->it_size << tbl->it_page_shift) >>
PAGE_SHIFT);
if (ret)
return ret;
@@ -134,7 +134,7 @@ static void tce_iommu_disable(struct tce_container 
*container)
if (!tbl)
return;
 
-   decrement_locked_vm((tbl->it_size << IOMMU_PAGE_SHIFT_4K) >>
+   decrement_locked_vm((tbl->it_size << tbl->it_page_shift) >>
PAGE_SHIFT);
 }
 
@@ -207,8 +207,8 @@ static long tce_iommu_ioctl(void *iommu_data,
if (info.argsz < minsz)
return -EINVAL;
 
-   info.dma32_window_start = tbl->it_offset << IOMMU_PAGE_SHIFT_4K;
-   info.dma32_window_size = tbl->it_size << IOMMU_PAGE_SHIFT_4K;
+   info.dma32_window_start = tbl->it_offset << tbl->it_page_shift;
+   info.dma32_window_size = tbl->it_size << tbl->it_page_shift;
info.flags = 0;
 
if (copy_to_user((void __user *)arg, &info, minsz))
@@ -261,17 +261,17 @@ static long tce_iommu_ioctl(void *iommu_data,
if (ret)
return ret;
 
-   for (i = 0; i < (param.size >> IOMMU_PAGE_SHIFT_4K); ++i) {
+   for (i = 0; i < (param.size >> tbl->it_page_shift); ++i) {
ret = iommu_put_tce_user_mode(tbl,
-   (param.iova >> IOMMU_PAGE_SHIFT_4K) + i,
+   (param.iova >> tbl->it_page_shift) + i,
tce);
if (ret)
break;
-   tce += IOMMU_PAGE_SIZE_4K;
+   tce += IOMMU_PAGE_SIZE(tbl);
}
if (ret)
iommu_clear_tces_and_put_pages(tbl,
-   param.iova >> IOMMU_PAGE_SHIFT_4K, i);
+   param.iova >> tbl->it_page_shift, i);
 
iommu_flush_tce(tbl);
 
@@ -312,13 +312,13 @@ static long tce_iommu_ioctl(void *iommu_data,
BUG_ON(!tbl->it_group);
 
ret = iommu_tce_clear_param_check(tbl, param.iova, 0,
-   param.size >> IOMMU_PAGE_SHIFT_4K);
+   param.size >> tbl->it_page_shift);
if (ret)
return ret;
 
ret = iommu_clear_tces_and_put_pages(tbl,
-   param.iova >> IOMMU_PAGE_SHIFT_4K,
-   param.size >> IOMMU_PAGE_SHIFT_4K);
+   param.iova >> tbl->it_page_shift,
+   param.size >> tbl->it_page_shift);
iommu_flush_tce(tbl);
 
return ret;
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 16/16] vfio: powerpc/spapr: Enable Dynamic DMA windows

2014-07-30 Thread Alexey Kardashevskiy
This defines and implements VFIO IOMMU API required to support
Dynamic DMA windows defined in the SPAPR specification. The ioctl handlers
implement host-size part of corresponding RTAS calls:
- VFIO_IOMMU_SPAPR_TCE_QUERY - ibm,query-pe-dma-window;
- VFIO_IOMMU_SPAPR_TCE_CREATE - ibm,create-pe-dma-window;
- VFIO_IOMMU_SPAPR_TCE_REMOVE - ibm,remove-pe-dma-window;
- VFIO_IOMMU_SPAPR_TCE_RESET - ibm,reset-pe-dma-window.

The VFIO IOMMU driver does basic sanity checks and calls corresponding
SPAPR TCE functions. At the moment only IODA2 (POWER8 PCI host bridge)
implements them.

This advertises VFIO_IOMMU_SPAPR_TCE_FLAG_DDW capability via
VFIO_IOMMU_SPAPR_TCE_GET_INFO.

This calls reset() when IOMMU is being disabled (happens when VFIO stops
using it).

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/platforms/powernv/pci-ioda.c |   1 +
 drivers/vfio/vfio_iommu_spapr_tce.c   | 173 +-
 include/uapi/linux/vfio.h |  37 ++-
 3 files changed, 209 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 6a847b2..f51afe2 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -859,6 +859,7 @@ static long pnv_pci_ioda2_ddw_create(struct 
spapr_tce_iommu_group *data,
 
/* Copy "invalidate" register address */
tbl64->it_index = pe->tce32.table.it_index;
+   tbl64->it_group = pe->tce32.table.it_group;
tbl64->it_type = TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE |
TCE_PCI_SWINV_PAIR;
tbl64->it_map = (void *) 0xDEADBEEF; /* poison */
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index 48b256c..32e2804 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -45,6 +45,7 @@ struct tce_container {
struct mutex lock;
struct iommu_group *grp;
bool enabled;
+   unsigned long start64;
 };
 
 
@@ -123,19 +124,36 @@ static void tce_iommu_disable(struct tce_container 
*container)
 
container->enabled = false;
 
-   if (!container->grp || !current->mm)
+   if (!container->grp)
return;
 
data = iommu_group_get_iommudata(container->grp);
if (!data || !data->iommu_owner || !data->ops->get_table)
return;
 
+   /* Try resetting, there might have been a 64bit window */
+   if (data->ops->reset)
+   data->ops->reset(data);
+
+   if (!current->mm)
+   return;
+
tbl = data->ops->get_table(data, TCE_DEFAULT_WINDOW);
if (!tbl)
return;
 
decrement_locked_vm((tbl->it_size << tbl->it_page_shift) >>
PAGE_SHIFT);
+
+   if (!container->start64)
+   return;
+
+   tbl = data->ops->get_table(data, container->start64);
+   if (!tbl)
+   return;
+
+   decrement_locked_vm((tbl->it_size << tbl->it_page_shift) >>
+   PAGE_SHIFT);
 }
 
 static void *tce_iommu_open(unsigned long arg)
@@ -210,6 +228,8 @@ static long tce_iommu_ioctl(void *iommu_data,
info.dma32_window_start = tbl->it_offset << tbl->it_page_shift;
info.dma32_window_size = tbl->it_size << tbl->it_page_shift;
info.flags = 0;
+   if (data->ops->query && data->ops->create && data->ops->remove)
+   info.flags |= VFIO_IOMMU_SPAPR_TCE_FLAG_DDW;
 
if (copy_to_user((void __user *)arg, &info, minsz))
return -EFAULT;
@@ -335,6 +355,157 @@ static long tce_iommu_ioctl(void *iommu_data,
tce_iommu_disable(container);
mutex_unlock(&container->lock);
return 0;
+
+   case VFIO_IOMMU_SPAPR_TCE_QUERY: {
+   struct vfio_iommu_spapr_tce_query query;
+   struct spapr_tce_iommu_group *data;
+
+   if (WARN_ON(!container->grp))
+   return -ENXIO;
+
+   data = iommu_group_get_iommudata(container->grp);
+
+   minsz = offsetofend(struct vfio_iommu_spapr_tce_query,
+   page_size_mask);
+
+   if (copy_from_user(&query, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (query.argsz < minsz)
+   return -EINVAL;
+
+   if (!data->ops->query || !data->iommu_owner)
+   return -ENOSYS;
+
+   ret = data->ops->query(data,
+   &query.windows_available,
+   &query.page_size_mask);
+
+   if (ret)
+   return ret;
+
+   if (copy_to_user((void __user *)arg, &query, minsz))
+   return -EFAULT;
+
+   return 0;
+   }
+   case VFIO_IOMMU_SPAPR_TCE_CREATE:

[PATCH v4 07/16] powerpc/spapr: vfio: Implement spapr_tce_iommu_ops

2014-07-30 Thread Alexey Kardashevskiy
Modern IBM POWERPC systems support multiple IOMMU tables per PE
so we need a more reliable way (compared to container_of()) to get
a PE pointer from the iommu_table struct pointer used in IOMMU functions.

At the moment IOMMU group data points to an iommu_table struct. This
introduces a spapr_tce_iommu_group struct which keeps an iommu_owner
and a spapr_tce_iommu_ops struct. For IODA, iommu_owner is a pointer to
the pnv_ioda_pe struct, for others it is still a pointer to
the iommu_table struct. The ops structs correspond to the type which
iommu_owner points to.

At the moment a get_table() callback is the only one. It returns
an iommu_table for a bus address.

As the IOMMU group data pointer points to variable type instead of
iommu_table, VFIO SPAPR TCE driver is fixed to use new type.
This changes the tce_container struct to keep iommu_group instead of
iommu_table.

So, it was:
- iommu_table points to iommu_group via iommu_table::it_group;
- iommu_group points to iommu_table via iommu_group_get_iommudata();

now it is:
- iommu_table points to iommu_group via iommu_table::it_group;
- iommu_group points to spapr_tce_iommu_group via
iommu_group_get_iommudata();
- spapr_tce_iommu_group points to either (depending on .get_table()):
- iommu_table;
- pnv_ioda_pe;

This uses pnv_ioda1_iommu_get_table for both IODA1&2 but IODA2 will
have own pnv_ioda2_iommu_get_table soon and pnv_ioda1_iommu_get_table
will only be used for IODA1.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/iommu.h|   6 ++
 arch/powerpc/include/asm/tce.h  |  15 
 arch/powerpc/kernel/iommu.c |  34 -
 arch/powerpc/platforms/powernv/pci-ioda.c   |  39 +-
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |   1 +
 arch/powerpc/platforms/powernv/pci.c|   2 +-
 arch/powerpc/platforms/pseries/iommu.c  |  10 ++-
 drivers/vfio/vfio_iommu_spapr_tce.c | 113 +---
 8 files changed, 184 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 42632c7..84ee339 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -108,13 +108,19 @@ extern void iommu_free_table(struct iommu_table *tbl, 
const char *node_name);
  */
 extern struct iommu_table *iommu_init_table(struct iommu_table * tbl,
int nid);
+
+struct spapr_tce_iommu_ops;
 #ifdef CONFIG_IOMMU_API
 extern void iommu_register_group(struct iommu_table *tbl,
+void *iommu_owner,
+struct spapr_tce_iommu_ops *ops,
 int pci_domain_number, unsigned long pe_num);
 extern int iommu_add_device(struct device *dev);
 extern void iommu_del_device(struct device *dev);
 #else
 static inline void iommu_register_group(struct iommu_table *tbl,
+   void *iommu_owner,
+   struct spapr_tce_iommu_ops *ops,
int pci_domain_number,
unsigned long pe_num)
 {
diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
index 743f36b..8bfe98f 100644
--- a/arch/powerpc/include/asm/tce.h
+++ b/arch/powerpc/include/asm/tce.h
@@ -50,5 +50,20 @@
 #define TCE_PCI_READ   0x1 /* read from PCI allowed */
 #define TCE_VB_WRITE   0x1 /* write from VB allowed */
 
+struct spapr_tce_iommu_group;
+
+#define TCE_DEFAULT_WINDOW ~(0ULL)
+
+struct spapr_tce_iommu_ops {
+   struct iommu_table *(*get_table)(
+   struct spapr_tce_iommu_group *data,
+   phys_addr_t addr);
+};
+
+struct spapr_tce_iommu_group {
+   void *iommu_owner;
+   struct spapr_tce_iommu_ops *ops;
+};
+
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_TCE_H */
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index f84f799..e203314 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -877,24 +877,52 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t 
size,
  */
 static void group_release(void *iommu_data)
 {
-   struct iommu_table *tbl = iommu_data;
-   tbl->it_group = NULL;
+   kfree(iommu_data);
 }
 
+static struct iommu_table *spapr_tce_get_default_table(
+   struct spapr_tce_iommu_group *data, phys_addr_t addr)
+{
+   struct iommu_table *tbl = data->iommu_owner;
+
+   if (addr == TCE_DEFAULT_WINDOW)
+   return tbl;
+
+   if ((addr >> tbl->it_page_shift) < tbl->it_size)
+   return tbl;
+
+   return NULL;
+}
+
+static struct spapr_tce_iommu_ops spapr_tce_default_ops = {
+   .get_table = spapr_tce_get_default_table
+};
+
 void iommu_register_group(struct iommu_table *tbl,
+   void *iommu_owner, struct spapr_tce_iommu_ops *ops,
  

[PATCH v4 01/16] rcu: Define notrace version of list_for_each_entry_rcu and list_entry_rcu

2014-07-30 Thread Alexey Kardashevskiy
This defines list_for_each_entry_rcu_notrace which uses
new list_entry_rcu_notrace which uses rcu_dereference_raw_notrace instead
of rcu_dereference_raw whici allows us using
list_for_each_entry_rcu_notrace when MMU is off (real mode).

Signed-off-by: Alexey Kardashevskiy 
---
 include/linux/rculist.h | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 8183b46..a155774 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -253,6 +253,25 @@ static inline void list_splice_init_rcu(struct list_head 
*list,
 })
 
 /**
+ * list_entry_rcu_notrace - get the struct for this entry
+ * @ptr:the &struct list_head pointer.
+ * @type:   the type of the struct this is embedded in.
+ * @member: the name of the list_struct within the struct.
+ *
+ * This primitive may safely run concurrently with the _rcu list-mutation
+ * primitives such as list_add_rcu() as long as it's guarded by 
rcu_read_lock().
+ *
+ * This is the same as list_entry_rcu() except that it does
+ * not do any RCU debugging or tracing.
+ */
+#define list_entry_rcu_notrace(ptr, type, member) \
+({ \
+   typeof(*ptr) __rcu *__ptr = (typeof(*ptr) __rcu __force *)ptr; \
+   container_of((typeof(ptr))rcu_dereference_raw_notrace(__ptr), \
+   type, member); \
+})
+
+/**
  * Where are list_empty_rcu() and list_first_entry_rcu()?
  *
  * Implementing those functions following their counterparts list_empty() and
@@ -308,6 +327,25 @@ static inline void list_splice_init_rcu(struct list_head 
*list,
pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
 
 /**
+ * list_for_each_entry_rcu_notrace - iterate over rcu list of given type
+ * @pos:   the type * to use as a loop cursor.
+ * @head:  the head for your list.
+ * @member:the name of the list_struct within the struct.
+ *
+ * This list-traversal primitive may safely run concurrently with
+ * the _rcu list-mutation primitives such as list_add_rcu()
+ * as long as the traversal is guarded by rcu_read_lock().
+ *
+ * This is the same as list_for_each_entry_rcu() except that it does
+ * not do any RCU debugging or tracing.
+ */
+#define list_for_each_entry_rcu_notrace(pos, head, member) \
+   for (pos = list_entry_rcu_notrace((head)->next, typeof(*pos), member); \
+   &pos->member != (head); \
+   pos = list_entry_rcu_notrace(pos->member.next, typeof(*pos), \
+   member))
+
+/**
  * list_for_each_entry_continue_rcu - continue iteration over list of given 
type
  * @pos:   the type * to use as a loop cursor.
  * @head:  the head for your list.
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 16/16] vfio: powerpc/spapr: Enable Dynamic DMA windows

2014-07-30 Thread Alexey Kardashevskiy
On 07/30/2014 07:31 PM, Alexey Kardashevskiy wrote:
> This defines and implements VFIO IOMMU API required to support
> Dynamic DMA windows defined in the SPAPR specification. The ioctl handlers
> implement host-size part of corresponding RTAS calls:
> - VFIO_IOMMU_SPAPR_TCE_QUERY - ibm,query-pe-dma-window;
> - VFIO_IOMMU_SPAPR_TCE_CREATE - ibm,create-pe-dma-window;
> - VFIO_IOMMU_SPAPR_TCE_REMOVE - ibm,remove-pe-dma-window;
> - VFIO_IOMMU_SPAPR_TCE_RESET - ibm,reset-pe-dma-window.
> 
> The VFIO IOMMU driver does basic sanity checks and calls corresponding
> SPAPR TCE functions. At the moment only IODA2 (POWER8 PCI host bridge)
> implements them.
> 
> This advertises VFIO_IOMMU_SPAPR_TCE_FLAG_DDW capability via
> VFIO_IOMMU_SPAPR_TCE_GET_INFO.
> 
> This calls reset() when IOMMU is being disabled (happens when VFIO stops
> using it).
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c |   1 +
>  drivers/vfio/vfio_iommu_spapr_tce.c   | 173 
> +-
>  include/uapi/linux/vfio.h |  37 ++-
>  3 files changed, 209 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 6a847b2..f51afe2 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -859,6 +859,7 @@ static long pnv_pci_ioda2_ddw_create(struct 
> spapr_tce_iommu_group *data,
>  
>   /* Copy "invalidate" register address */
>   tbl64->it_index = pe->tce32.table.it_index;
> + tbl64->it_group = pe->tce32.table.it_group;

Just noticed. This does not belong here, this must be moved to earlier patch.


>   tbl64->it_type = TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE |
>   TCE_PCI_SWINV_PAIR;
>   tbl64->it_map = (void *) 0xDEADBEEF; /* poison */
> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> b/drivers/vfio/vfio_iommu_spapr_tce.c
> index 48b256c..32e2804 100644
> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> @@ -45,6 +45,7 @@ struct tce_container {
>   struct mutex lock;
>   struct iommu_group *grp;
>   bool enabled;
> + unsigned long start64;
>  };
>  
>  
> @@ -123,19 +124,36 @@ static void tce_iommu_disable(struct tce_container 
> *container)
>  
>   container->enabled = false;
>  
> - if (!container->grp || !current->mm)
> + if (!container->grp)
>   return;
>  
>   data = iommu_group_get_iommudata(container->grp);
>   if (!data || !data->iommu_owner || !data->ops->get_table)
>   return;
>  
> + /* Try resetting, there might have been a 64bit window */
> + if (data->ops->reset)
> + data->ops->reset(data);
> +
> + if (!current->mm)
> + return;
> +
>   tbl = data->ops->get_table(data, TCE_DEFAULT_WINDOW);
>   if (!tbl)
>   return;
>  
>   decrement_locked_vm((tbl->it_size << tbl->it_page_shift) >>
>   PAGE_SHIFT);
> +
> + if (!container->start64)
> + return;
> +
> + tbl = data->ops->get_table(data, container->start64);
> + if (!tbl)
> + return;
> +
> + decrement_locked_vm((tbl->it_size << tbl->it_page_shift) >>
> + PAGE_SHIFT);
>  }
>  
>  static void *tce_iommu_open(unsigned long arg)
> @@ -210,6 +228,8 @@ static long tce_iommu_ioctl(void *iommu_data,
>   info.dma32_window_start = tbl->it_offset << tbl->it_page_shift;
>   info.dma32_window_size = tbl->it_size << tbl->it_page_shift;
>   info.flags = 0;
> + if (data->ops->query && data->ops->create && data->ops->remove)
> + info.flags |= VFIO_IOMMU_SPAPR_TCE_FLAG_DDW;
>  
>   if (copy_to_user((void __user *)arg, &info, minsz))
>   return -EFAULT;
> @@ -335,6 +355,157 @@ static long tce_iommu_ioctl(void *iommu_data,
>   tce_iommu_disable(container);
>   mutex_unlock(&container->lock);
>   return 0;
> +
> + case VFIO_IOMMU_SPAPR_TCE_QUERY: {
> + struct vfio_iommu_spapr_tce_query query;
> + struct spapr_tce_iommu_group *data;
> +
> + if (WARN_ON(!container->grp))
> + return -ENXIO;
> +
> + data = iommu_group_get_iommudata(container->grp);
> +
> + minsz = offsetofend(struct vfio_iommu_spapr_tce_query,
> + page_size_mask);
> +
> + if (copy_from_user(&query, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (query.argsz < minsz)
> + return -EINVAL;
> +
> + if (!data->ops->query || !data->iommu_owner)
> + return -ENOSYS;
> +
> + ret = data->ops->query(data,
> + &query.windows_available,
> + &quer

[PATCH RFC] ASoC: fsl: Add Freescale Generic ASoC Sound Card with ASRC support

2014-07-30 Thread Nicolin Chen
The Freescale Generic ASoC Sound Card is a general ASoC DAI Link driver that
can be used, ideally, for all Freescale CPU DAI drivers and external CODECs.

The idea of this generic sound card is a bit like ASoC Simple Card. However,
for Freescale SoCs (especially those released in recent years), most of them
have ASRC (Documentation/devicetree/bindings/sound/fsl,asrc.txt) inside. And
this is a specific feature that might be painstakingly controlled and merged
into the Simple Card driver.

So having this driver will allow all Freescale SoC users to benefit from the
simplification to support a new card and the capability of wide sample rates
support through ASRC.

The driver is initially designed for sound card using I2S or PCM DAI formats.
However, it's also possible to merge those non-I2S/PCM type sound cards, such
as S/PDIF audio and HDMI audio, into this card as long as the merge will not
break the original function and as long as there is something redundant that
can be abstracted along with I2S type sound cards.

As an initial version, it only supports three cards that I can test:
imx-audio-cs42888, a new card that links ESAI with CS42888 CODEC
imx-audio-sgtl5000, just like the old imx-sgtl5000.c driver
imx-audio-wm8962, just like the old imx-wm8962.c driver

The driver is also compatible with the old Device Tree bindings of WM8962 and
SGTL5000. So we may consider to remove those two drivers after this driver is
totally enabled. (It needs to be added into defconfig)

Signed-off-by: Nicolin Chen 
---
 .../devicetree/bindings/sound/fsl-asoc-card.txt|  82 +++
 sound/soc/fsl/Kconfig  |  16 +
 sound/soc/fsl/Makefile |   2 +
 sound/soc/fsl/fsl-asoc-card.c  | 573 +
 4 files changed, 673 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/sound/fsl-asoc-card.txt
 create mode 100644 sound/soc/fsl/fsl-asoc-card.c

diff --git a/Documentation/devicetree/bindings/sound/fsl-asoc-card.txt 
b/Documentation/devicetree/bindings/sound/fsl-asoc-card.txt
new file mode 100644
index 000..a96774c
--- /dev/null
+++ b/Documentation/devicetree/bindings/sound/fsl-asoc-card.txt
@@ -0,0 +1,82 @@
+Freescale Generic ASoC Sound Card with ASRC support
+
+The Freescale Generic ASoC Sound Card can be used, ideally, for all Freescale
+SoCs connecting with external CODECs.
+
+The idea of this generic sound card is a bit like ASoC Simple Card. However,
+for Freescale SoCs (especially those released in recent years), most of them
+have ASRC (Documentation/devicetree/bindings/sound/fsl,asrc.txt) inside. And
+this is a specific feature that might be painstakingly controlled and merged
+into the Simple Card.
+
+So having this generic sound card allows all Freescale SoC users to benefit
+from the simplification of a new card support and the capability of the wide
+sample rates support through ASRC.
+
+Note: The card is initially designed for those sound cards who use I2S and
+  PCM DAI formats. However, it'll be also possible to support those non
+  I2S/PCM type sound cards, such as S/PDIF audio and HDMI audio, as long
+  as the driver has been properly upgraded.
+
+
+The compatible list for this generic sound card currently:
+ "fsl,imx-audio-cs42888"
+
+ "fsl,imx-audio-wm8962"
+ (compatible with Documentation/devicetree/bindings/sound/imx-audio-wm8962.txt)
+
+ "fsl,imx-audio-sgtl5000"
+ (compatible with 
Documentation/devicetree/bindings/sound/imx-audio-sgtl5000.txt)
+
+Required properties:
+
+  - compatible : Contains one of entries in the compatible list.
+
+  - model  : The user-visible name of this sound complex
+
+  - audio-cpu  : The phandle of an CPU DAI controller
+
+  - audio-codec: The phandle of an audio codec
+
+  - audio-routing  : A list of the connections between audio components.
+ Each entry is a pair of strings, the first being the
+ connection's sink, the second being the connection's
+ source. There're a few pre-designed board connectors:
+  * Line Out Jack
+  * Line In Jack
+  * Headphone Jack
+  * Mic Jack
+  * Ext Spk
+  * AMIC (stands for Analog Microphone Jack)
+  * DMIC (stands for Digital Microphone Jack)
+
+ Note: The "Mic Jack" and "AMIC" are redundant while
+   coexsiting in order to support the old bindings
+   of wm8962 and sgtl5000.
+
+Optional properties:
+
+  - audio-asrc : The phandle of ASRC. It can be absent if there's no
+ need to add ASRC support via DPCM.
+
+Example:
+sound-cs42888 {
+   compatible = "fsl,imx-audio-cs42888";
+   model = "cs42888-audio";
+   audio-cpu = <&

[PATCH 3.12 71/94] locking/mutex: Disable optimistic spinning on some architectures

2014-07-30 Thread Jiri Slaby
From: Peter Zijlstra 

3.12-stable review patch.  If anyone has any objections, please let me know.

===

commit 4badad352a6bb202ec68afa7a574c0bb961e5ebc upstream.

The optimistic spin code assumes regular stores and cmpxchg() play nice;
this is found to not be true for at least: parisc, sparc32, tile32,
metag-lock1, arc-!llsc and hexagon.

There is further wreckage, but this in particular seemed easy to
trigger, so blacklist this.

Opt in for known good archs.

Signed-off-by: Peter Zijlstra 
Reported-by: Mikulas Patocka 
Cc: David Miller 
Cc: Chris Metcalf 
Cc: James Bottomley 
Cc: Vineet Gupta 
Cc: Jason Low 
Cc: Waiman Long 
Cc: "James E.J. Bottomley" 
Cc: Paul McKenney 
Cc: John David Anglin 
Cc: James Hogan 
Cc: Linus Torvalds 
Cc: Davidlohr Bueso 
Cc: Benjamin Herrenschmidt 
Cc: Catalin Marinas 
Cc: Russell King 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparcli...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/20140606175316.gv13...@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
Signed-off-by: Jiri Slaby 
---
 arch/arm/Kconfig | 1 +
 arch/arm64/Kconfig   | 1 +
 arch/powerpc/Kconfig | 1 +
 arch/sparc/Kconfig   | 1 +
 arch/x86/Kconfig | 1 +
 kernel/Kconfig.locks | 5 -
 6 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e47fcd1e9645..99e1ce978cf9 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -5,6 +5,7 @@ config ARM
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAVE_CUSTOM_GPIO_H
+   select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_WANT_IPC_PARSE_VERSION
select BUILDTIME_EXTABLE_SORT if MMU
select CLONE_BACKWARDS
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c04454876bcb..fe70eaea0e28 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1,6 +1,7 @@
 config ARM64
def_bool y
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
+   select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_WANT_OPTIONAL_GPIOLIB
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
select ARCH_WANT_FRAME_POINTERS
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d5d026b6d237..2e0ddfadc0b9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -138,6 +138,7 @@ config PPC
select OLD_SIGSUSPEND
select OLD_SIGACTION if PPC32
select HAVE_DEBUG_STACKOVERFLOW
+   select ARCH_SUPPORTS_ATOMIC_RMW
 
 config EARLY_PRINTK
bool
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 4e5683877b93..d60f34dbae89 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -75,6 +75,7 @@ config SPARC64
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select HAVE_C_RECORDMCOUNT
select NO_BOOTMEM
+   select ARCH_SUPPORTS_ATOMIC_RMW
 
 config ARCH_DEFCONFIG
string
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index eb2dfa61eabe..9dc1a24d41b8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -123,6 +123,7 @@ config X86
select COMPAT_OLD_SIGACTION if IA32_EMULATION
select RTC_LIB
select HAVE_DEBUG_STACKOVERFLOW
+   select ARCH_SUPPORTS_ATOMIC_RMW
 
 config INSTRUCTION_DECODER
def_bool y
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index d2b32ac27a39..ecee67a00f5f 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -220,6 +220,9 @@ config INLINE_WRITE_UNLOCK_IRQRESTORE
 
 endif
 
+config ARCH_SUPPORTS_ATOMIC_RMW
+   bool
+
 config MUTEX_SPIN_ON_OWNER
def_bool y
-   depends on SMP && !DEBUG_MUTEXES
+   depends on SMP && !DEBUG_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW
-- 
2.0.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/4] ASoC: fsl_ssi: Add stream names for DPCM usage

2014-07-30 Thread Timur Tabi

Nicolin Chen wrote:

DPCM needs extra dapm routes in the machine driver to route audio
between Front-End and Back-End. In order to differ the stream names
in the route map from CODECs, we here add specific stream names to
SSI driver so that we can implement ASRC via DPCM to it.

Signed-off-by: Nicolin Chen


Acked-by: Timur Tabi 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [2/2] powerpc/fsl-booke: Add initial T1042RDB_PI board support

2014-07-30 Thread Scott Wood
On Wed, Jul 09, 2014 at 09:54:11AM +0530, Priyanka Jain wrote:
> diff --git a/arch/powerpc/boot/dts/t104xrdb.dtsi 
> b/arch/powerpc/boot/dts/t104xrdb.dtsi
> index 9aaefa5..e7e765f 100644
> --- a/arch/powerpc/boot/dts/t104xrdb.dtsi
> +++ b/arch/powerpc/boot/dts/t104xrdb.dtsi
> @@ -57,7 +57,8 @@
>   };
>  
>   cpld@3,0 {
> - compatible = "fsl,t1040rdb-cpld","fsl,t1042rdb-cpld";
> + compatible = "fsl,t1040rdb-cpld","fsl,t1042rdb-cpld",
> + "fsl,t1042rdb_pi-cpld";
>   reg = <3 0 0x300>;
>   };
>   };

What's going on here?  This file is used by all three boards.  If you
need to distinguish one board's CPLD from another's, you'll have to do it
somewhere else.  If the CPLDs are exactly the same and no distinction
needs to be made, then you don't need three compatible strings.  Even
then, you may wish to specify the exact board as the first compatible
string, but again you'll need to patch that in elsewhere so that it
actually matches the board.

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 5/7] powerpc/corenet: Add MDIO bus muxing support to the board device tree(s)

2014-07-30 Thread Emil Medve
Hello Scott,


On 07/29/2014 02:58 PM, Scott Wood wrote:
> On Mon, 2014-07-28 at 06:51 +, Emil Medve wrote:
>> Hello Scott,
>>
>>
>> Scott Wood  freescale.com> writes:
>>> On Wed, 2014-07-16 at 15:17 -0500, Shruti Kanetkar wrote:
 +  mdio  fd000 {
 +  /* For 10g interfaces */
 +  phy_xaui_slot1: xaui-phy  slot1 {
 +  status = "disabled";
 +  compatible = 
 "ethernet-phy-ieee802.3-c45";
 +  reg = <0x7>; /* default switch setting 
 on slot1 of AMC2PEX */
 +  };
>>>
>>> Why xaui-phy and not ethernet-phy?
>>>
>>> As for the device_type discussion from v1, there is a generic binding
>>> that says device_type "should" be ethernet-phy.
>>
>> I have no strong feelings about this and we can use ethernet-phy, but:
>>
>> 1. The binding is old/stale (?) as it still uses device_type and the kernel
>> doesn't seem to use anymore the device_type for PHY(s)
> 
> Yes.
> 
>> 2. The binding asks "ethernet-phy" for the device_type property, not for the
>> name. As such TBI PHY(s) use (upstream) the tbi-phy@ node name
> 
> It shows ethernet-phy as the name in the example.  ePAPR urges generic
> node names (this was also a recommendation for IEEE1275), and has
> ethernet-phy on the preferred list.  Is a xaui-phy not an ethernet phy?

So you thinking somebody should cleanup all the sgmii-phy and tbi-phy
node names, huh?

It seems that a number of tbi-phy instances slipped by you:

1be62c6 powerpc/mpc85xx: Add BSC9132 QDS Support
bf57aeb powerpc/85xx: add the P1020RDB-PD DTS support
8a6be2b powerpc/85xx: Add TWR-P1025 board support

 +  mdio0: mdio  fc000 {
 +  };
>>>
>>> Why is the empty node needed?
>>
>> For the label
> 
> For mdio-parent-bus, or is there some other dts layer that makes this
> node non-empty?

'powerpc/corenet: Create the dts components for the DPAA FMan' -
http://patchwork.ozlabs.org/patch/370872 and 'powerpc/corenet: Add DPAA
FMan support to the SoC device tree(s)' -
http://patchwork.ozlabs.org/patch/370868 add content to said node


Cheers,
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 5/7] powerpc/corenet: Add MDIO bus muxing support to the board device tree(s)

2014-07-30 Thread Scott Wood
On Wed, 2014-07-30 at 16:52 -0500, Emil Medve wrote:
> Hello Scott,
> 
> 
> On 07/29/2014 02:58 PM, Scott Wood wrote:
> > On Mon, 2014-07-28 at 06:51 +, Emil Medve wrote:
> >> Hello Scott,
> >>
> >>
> >> Scott Wood  freescale.com> writes:
> >>> On Wed, 2014-07-16 at 15:17 -0500, Shruti Kanetkar wrote:
>  +mdio  fd000 {
>  +/* For 10g interfaces */
>  +phy_xaui_slot1: xaui-phy  slot1 {
>  +status = "disabled";
>  +compatible = 
>  "ethernet-phy-ieee802.3-c45";
>  +reg = <0x7>; /* default switch 
>  setting on slot1 of AMC2PEX */
>  +};
> >>>
> >>> Why xaui-phy and not ethernet-phy?
> >>>
> >>> As for the device_type discussion from v1, there is a generic binding
> >>> that says device_type "should" be ethernet-phy.
> >>
> >> I have no strong feelings about this and we can use ethernet-phy, but:
> >>
> >> 1. The binding is old/stale (?) as it still uses device_type and the kernel
> >> doesn't seem to use anymore the device_type for PHY(s)
> > 
> > Yes.
> > 
> >> 2. The binding asks "ethernet-phy" for the device_type property, not for 
> >> the
> >> name. As such TBI PHY(s) use (upstream) the tbi-phy@ node name
> > 
> > It shows ethernet-phy as the name in the example.  ePAPR urges generic
> > node names (this was also a recommendation for IEEE1275), and has
> > ethernet-phy on the preferred list.  Is a xaui-phy not an ethernet phy?
> 
> So you thinking somebody should cleanup all the sgmii-phy and tbi-phy
> node names, huh?

No, I was just wondering why we're adding yet another name, and whether
there's any value in it.

> It seems that a number of tbi-phy instances slipped by you:
> 
> 1be62c6 powerpc/mpc85xx: Add BSC9132 QDS Support
> bf57aeb powerpc/85xx: add the P1020RDB-PD DTS support
> 8a6be2b powerpc/85xx: Add TWR-P1025 board support

tbi-phy is existing practice.  xaui-phy isn't.

>  +mdio0: mdio  fc000 {
>  +};
> >>>
> >>> Why is the empty node needed?
> >>
> >> For the label
> > 
> > For mdio-parent-bus, or is there some other dts layer that makes this
> > node non-empty?
> 
> 'powerpc/corenet: Create the dts components for the DPAA FMan' -
> http://patchwork.ozlabs.org/patch/370872

Why does this patch define the mdio0 label for mdio@e1120, but not
define a label for any other node?

>  and 'powerpc/corenet: Add DPAA
> FMan support to the SoC device tree(s)' -
> http://patchwork.ozlabs.org/patch/370868 add content to said node

This one adds content to some mdio nodes, none of which are mdio@fc000
or &mdio0.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] ASoC: fsl_asrc: Fix sparse warnings in FSL_ASRC_FORMATS due to typo

2014-07-30 Thread Nicolin Chen
reproduce: make C=1 CF=-D__CHECK_ENDIAN__

sparse warnings: (new ones prefixed by >>)

>> sound/soc/fsl/fsl_asrc.c:563:28: sparse: restricted snd_pcm_format_t 
>> degrades to integer
>> sound/soc/fsl/fsl_asrc.c:570:28: sparse: restricted snd_pcm_format_t 
>> degrades to integer

vim +563 sound/soc/fsl/fsl_asrc.c

  557  .probe = fsl_asrc_dai_probe,
  558  .playback = {
  559  .stream_name = "ASRC-Playback",
  560  .channels_min = 1,
  561  .channels_max = 10,
  562  .rates = FSL_ASRC_RATES,
> 563  .formats = FSL_ASRC_FORMATS,
  564  },
  565  .capture = {
  566  .stream_name = "ASRC-Capture",
  567  .channels_min = 1,
  568  .channels_max = 10,
  569  .rates = FSL_ASRC_RATES,
> 570  .formats = FSL_ASRC_FORMATS,
  571  },
  572  .ops = &fsl_asrc_dai_ops,
  573  };

Reported-by: kbuild test robot 
Signed-off-by: Nicolin Chen 
---
 sound/soc/fsl/fsl_asrc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c
index 41699f7..cdb5779 100644
--- a/sound/soc/fsl/fsl_asrc.c
+++ b/sound/soc/fsl/fsl_asrc.c
@@ -551,7 +551,7 @@ static int fsl_asrc_dai_probe(struct snd_soc_dai *dai)
 #define FSL_ASRC_RATES  SNDRV_PCM_RATE_8000_192000
 #define FSL_ASRC_FORMATS   (SNDRV_PCM_FMTBIT_S24_LE | \
 SNDRV_PCM_FMTBIT_S16_LE | \
-SNDRV_PCM_FORMAT_S20_3LE)
+SNDRV_PCM_FMTBIT_S20_3LE)
 
 static struct snd_soc_dai_driver fsl_asrc_dai = {
.probe = fsl_asrc_dai_probe,
-- 
1.8.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [2/2] powerpc/fsl-booke: Add initial T1042RDB_PI board support

2014-07-30 Thread Priyanka Jain


-Original Message-
From: Wood Scott-B07421 
Sent: Thursday, July 31, 2014 1:43 AM
To: Jain Priyanka-B32167
Cc: devicet...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Aggrwal 
Poonam-B10812; Kushwaha Prabhakar-B32579
Subject: Re: [2/2] powerpc/fsl-booke: Add initial T1042RDB_PI board support

On Wed, Jul 09, 2014 at 09:54:11AM +0530, Priyanka Jain wrote:
> diff --git a/arch/powerpc/boot/dts/t104xrdb.dtsi 
> b/arch/powerpc/boot/dts/t104xrdb.dtsi
> index 9aaefa5..e7e765f 100644
> --- a/arch/powerpc/boot/dts/t104xrdb.dtsi
> +++ b/arch/powerpc/boot/dts/t104xrdb.dtsi
> @@ -57,7 +57,8 @@
>   };
>  
>   cpld@3,0 {
> - compatible = "fsl,t1040rdb-cpld","fsl,t1042rdb-cpld";
> + compatible = "fsl,t1040rdb-cpld","fsl,t1042rdb-cpld",
> + "fsl,t1042rdb_pi-cpld";
>   reg = <3 0 0x300>;
>   };
>   };

What's going on here?  This file is used by all three boards.  If you need to 
distinguish one board's CPLD from another's, you'll have to do it somewhere 
else.  If the CPLDs are exactly the same and no distinction needs to be made, 
then you don't need three compatible strings.  Even then, you may wish to 
specify the exact board as the first compatible string, but again you'll need 
to patch that in elsewhere so that it actually matches the board
.
As the register set of CPLD for all three boards is same, I am thinking of 
replacing this with t104srdb-cpld
compatible = "fsl,t104xrdb-cpld","
Is this OK?

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 5/7] powerpc/corenet: Add MDIO bus muxing support to the board device tree(s)

2014-07-30 Thread Emil Medve
Hello Scott,


On 07/30/2014 09:30 PM, Scott Wood wrote:
> On Wed, 2014-07-30 at 16:52 -0500, Emil Medve wrote:
>> Hello Scott,
>>
>>
>> On 07/29/2014 02:58 PM, Scott Wood wrote:
>>> On Mon, 2014-07-28 at 06:51 +, Emil Medve wrote:
 Hello Scott,


 Scott Wood  freescale.com> writes:
> On Wed, 2014-07-16 at 15:17 -0500, Shruti Kanetkar wrote:
>> +mdio  fd000 {
>> +/* For 10g interfaces */
>> +phy_xaui_slot1: xaui-phy  slot1 {
>> +status = "disabled";
>> +compatible = 
>> "ethernet-phy-ieee802.3-c45";
>> +reg = <0x7>; /* default switch 
>> setting on slot1 of AMC2PEX */
>> +};
>
> Why xaui-phy and not ethernet-phy?
>
> As for the device_type discussion from v1, there is a generic binding
> that says device_type "should" be ethernet-phy.

 I have no strong feelings about this and we can use ethernet-phy, but:

 1. The binding is old/stale (?) as it still uses device_type and the kernel
 doesn't seem to use anymore the device_type for PHY(s)
>>>
>>> Yes.
>>>
 2. The binding asks "ethernet-phy" for the device_type property, not for 
 the
 name. As such TBI PHY(s) use (upstream) the tbi-phy@ node name
>>>
>>> It shows ethernet-phy as the name in the example.  ePAPR urges generic
>>> node names (this was also a recommendation for IEEE1275), and has
>>> ethernet-phy on the preferred list.  Is a xaui-phy not an ethernet phy?
>>
>> So you thinking somebody should cleanup all the sgmii-phy and tbi-phy
>> node names, huh?
> 
> No, I was just wondering why we're adding yet another name, and whether
> there's any value in it.

That's fair. We'll just use ethernet-phy

>> It seems that a number of tbi-phy instances slipped by you:
>>
>> 1be62c6 powerpc/mpc85xx: Add BSC9132 QDS Support
>> bf57aeb powerpc/85xx: add the P1020RDB-PD DTS support
>> 8a6be2b powerpc/85xx: Add TWR-P1025 board support
> 
> tbi-phy is existing practice.  xaui-phy isn't.
> 
>> +mdio0: mdio  fc000 {
>> +};
>
> Why is the empty node needed?

 For the label
>>>
>>> For mdio-parent-bus, or is there some other dts layer that makes this
>>> node non-empty?
>>
>> 'powerpc/corenet: Create the dts components for the DPAA FMan' -
>> http://patchwork.ozlabs.org/patch/370872
> 
> Why does this patch define the mdio0 label for mdio@e1120, but not
> define a label for any other node?

Only MDIO controllers that are pinned out have these labels. Only pinned
out MDIO(s) are capable of controlling external PHY(s) via these board
level MDIO buses

>>  and 'powerpc/corenet: Add DPAA
>> FMan support to the SoC device tree(s)' -
>> http://patchwork.ozlabs.org/patch/370868 add content to said node
> 
> This one adds content to some mdio nodes, none of which are mdio@fc000
> or &mdio0.

This patch adds the SoC level PHY(s), which in this case are just TBI
PHY(s): i.e. no FMan v2 10 Gb/s MDIO or FMan v3 standalone MDIO devices.
Also the labels become relevant only at board level to connect the MDIO
buses to their corresponding MDIO controllers


Cheers,
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [2/2] powerpc/fsl-booke: Add initial T1042RDB_PI board support

2014-07-30 Thread Scott Wood
On Wed, 2014-07-30 at 23:37 -0500, Jain Priyanka-B32167 wrote:
> 
> -Original Message-
> From: Wood Scott-B07421 
> Sent: Thursday, July 31, 2014 1:43 AM
> To: Jain Priyanka-B32167
> Cc: devicet...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Aggrwal 
> Poonam-B10812; Kushwaha Prabhakar-B32579
> Subject: Re: [2/2] powerpc/fsl-booke: Add initial T1042RDB_PI board support
> 
> On Wed, Jul 09, 2014 at 09:54:11AM +0530, Priyanka Jain wrote:
> > diff --git a/arch/powerpc/boot/dts/t104xrdb.dtsi 
> > b/arch/powerpc/boot/dts/t104xrdb.dtsi
> > index 9aaefa5..e7e765f 100644
> > --- a/arch/powerpc/boot/dts/t104xrdb.dtsi
> > +++ b/arch/powerpc/boot/dts/t104xrdb.dtsi
> > @@ -57,7 +57,8 @@
> > };
> >  
> > cpld@3,0 {
> > -   compatible = "fsl,t1040rdb-cpld","fsl,t1042rdb-cpld";
> > +   compatible = "fsl,t1040rdb-cpld","fsl,t1042rdb-cpld",
> > +   "fsl,t1042rdb_pi-cpld";
> > reg = <3 0 0x300>;
> > };
> > };
> 
> What's going on here?  This file is used by all three boards.  If you need to 
> distinguish one board's CPLD from another's, you'll have to do it somewhere 
> else.  If the CPLDs are exactly the same and no distinction needs to be made, 
> then you don't need three compatible strings.  Even then, you may wish to 
> specify the exact board as the first compatible string, but again you'll need 
> to patch that in elsewhere so that it actually matches the board
> .
> As the register set of CPLD for all three boards is same, I am thinking of 
> replacing this with t104srdb-cpld
> compatible = "fsl,t104xrdb-cpld","
> Is this OK?

No.  Wildcards aren't allowed in compatible strings, because you never
know what other devices might exist in the future that match the
wildcard.

If the CPLD logic is truly 100% identical, just pick one of the three to
be the canonical name.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 5/7] powerpc/corenet: Add MDIO bus muxing support to the board device tree(s)

2014-07-30 Thread Scott Wood
On Wed, 2014-07-30 at 23:35 -0500, Emil Medve wrote:
> Hello Scott,
> 
> 
> On 07/30/2014 09:30 PM, Scott Wood wrote:
> > On Wed, 2014-07-30 at 16:52 -0500, Emil Medve wrote:
> >> +  mdio0: mdio  fc000 {
> >> +  };
> >
> > Why is the empty node needed?
> 
>  For the label
> >>>
> >>> For mdio-parent-bus, or is there some other dts layer that makes this
> >>> node non-empty?
> >>
> >> 'powerpc/corenet: Create the dts components for the DPAA FMan' -
> >> http://patchwork.ozlabs.org/patch/370872
> > 
> > Why does this patch define the mdio0 label for mdio@e1120, but not
> > define a label for any other node?
> 
> Only MDIO controllers that are pinned out have these labels. Only pinned
> out MDIO(s) are capable of controlling external PHY(s) via these board
> level MDIO buses

Is there any reason to describe non-pinned-out MDIO controllers at all?
Is the lack of pinning out inherent to the silicon, or is it board
design/config?  Is the answer different for different MDIO controllers?
I'm just curious why mdio@e1120 is labelled in a non-board dtsi while
others are labelled elsewhere.

-Scott


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 5/7] powerpc/corenet: Add MDIO bus muxing support to the board device tree(s)

2014-07-30 Thread Emil Medve
Hello Scott,


On 07/31/2014 12:28 AM, Scott Wood wrote:
> On Wed, 2014-07-30 at 23:35 -0500, Emil Medve wrote:
>> Hello Scott,
>>
>>
>> On 07/30/2014 09:30 PM, Scott Wood wrote:
>>> On Wed, 2014-07-30 at 16:52 -0500, Emil Medve wrote:
 +  mdio0: mdio  fc000 {
 +  };
>>>
>>> Why is the empty node needed?
>>
>> For the label
>
> For mdio-parent-bus, or is there some other dts layer that makes this
> node non-empty?

 'powerpc/corenet: Create the dts components for the DPAA FMan' -
 http://patchwork.ozlabs.org/patch/370872
>>>
>>> Why does this patch define the mdio0 label for mdio@e1120, but not
>>> define a label for any other node?
>>
>> Only MDIO controllers that are pinned out have these labels. Only pinned
>> out MDIO(s) are capable of controlling external PHY(s) via these board
>> level MDIO buses
> 
> Is there any reason to describe non-pinned-out MDIO controllers at all?

Yes. For the internal TBI PHY(s). Each MAC supporting SGMII has a TBI
PHY that is attached to the MDIO controller of the respective MAC

> Is the lack of pinning out inherent to the silicon, or is it board
> design/config?

It's a silicon level decision

> Is the answer different for different MDIO controllers?

You mean non-FSL MDIO controllers? Dunno. All FSL SoC have the same MDIO
pin-out decision

> I'm just curious why mdio@e1120 is labelled in a non-board dtsi while
> others are labelled elsewhere.

Labels are relevant only in the context of 'powerpc/corenet: Add MDIO
bus muxing support to the board device tree(s)' -
http://patchwork.ozlabs.org/patch/370866. Most labels are created and
used in the board .dts file except b4qds.dtsi which is shared between
b4420qds.dts and b4860qds.dts


Cheers,
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V7 00/17] Enable SRIOV on POWER8

2014-07-30 Thread Benjamin Herrenschmidt
On Thu, 2014-07-24 at 14:22 +0800, Wei Yang wrote:
> This patch set enables the SRIOV on POWER8.

Hi Bjorn !

There are 4 patches in there to the generic code, but so far not much
review from your side of the fence :-)

How do you want to proceed ?

Cheers,
Ben.

> The gerneral idea is put each VF into one individual PE and allocate required
> resources like DMA/MSI.
> 
> One thing special for VF PE is we use M64BT to cover the IOV BAR. M64BT is one
> hardware on POWER platform to map MMIO address to PE. By using M64BT, we could
> map one individual VF to a VF PE, which introduce more flexiblity to users.
> 
> To achieve this effect, we need to do some hack on pci devices's resources.
> 1. Expand the IOV BAR properly.
>Done by pnv_pci_ioda_fixup_iov_resources().
> 2. Shift the IOV BAR properly.
>Done by pnv_pci_vf_resource_shift().
> 3. IOV BAR alignment is the total size instead of an individual size on
>powernv platform.
>Done by pnv_pcibios_sriov_resource_alignment().
> 4. Take the IOV BAR alignment into consideration in the sizing and assigning.
>This is achieved by commit: "PCI: Take additional IOV BAR alignment in
>sizing and assigning"
> 
> Test Environment:
>The SRIOV device tested is Emulex Lancer and Mellanox ConnectX-3 on
>POWER8.
> 
> Examples on pass through a VF to guest through vfio:
>   1. install necessary modules
>  modprobe vfio
>  modprobe vfio-pci
>   2. retrieve the iommu_group the device belongs to
>  readlink /sys/bus/pci/devices/:06:0d.0/iommu_group
>  ../../../../kernel/iommu_groups/26
>  This means it belongs to group 26
>   3. see how many devices under this iommu_group
>  ls /sys/kernel/iommu_groups/26/devices/
>   4. unbind the original driver and bind to vfio-pci driver
>  echo :06:0d.0 > /sys/bus/pci/devices/:06:0d.0/driver/unbind
>  echo  1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id
>  Note: this should be done for each device in the same iommu_group
>   5. Start qemu and pass device through vfio
>  /home/ywywyang/git/qemu-impreza/ppc64-softmmu/qemu-system-ppc64 \
>  -M pseries -m 2048 -enable-kvm -nographic \
>  -drive file=/home/ywywyang/kvm/fc19.img \
>  -monitor telnet:localhost:5435,server,nowait -boot cd \
>  -device 
> "spapr-pci-vfio-host-bridge,id=CXGB3,iommu=26,index=6"
> 
> Verify this is the exact VF response:
>   1. ping from a machine in the same subnet(the broadcast domain)
>   2. run arp -n on this machine
>  9.115.251.20 ether   00:00:c9:df:ed:bf   C eth0
>   3. ifconfig in the guest
>  # ifconfig eth1
>  eth1: flags=4163  mtu 1500
>   inet 9.115.251.20  netmask 255.255.255.0  broadcast 
> 9.115.251.255
>   inet6 fe80::200:c9ff:fedf:edbf  prefixlen 64  scopeid 0x20
>   ether 00:00:c9:df:ed:bf  txqueuelen 1000 (Ethernet)
>   RX packets 175  bytes 13278 (12.9 KiB)
>   RX errors 0  dropped 0  overruns 0  frame 0
>   TX packets 58  bytes 9276 (9.0 KiB)
>   TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>   4. They have the same MAC address
> 
>   Note: make sure you shutdown other network interfaces in guest.
> 
> ---
> v6 -> v7:
>1. add IORESOURCE_ARCH flag for IOV BAR on powernv platform.
>2. when IOV BAR has IORESOURCE_ARCH flag, the size is retrieved from
>   hardware directly. If not, calculate as usual.
>3. reorder the patch set, group them by subsystem:
>   PCI, powerpc, powernv
>4. rebase it on 3.16-rc6
> v5 -> v6:
>1. remove pcibios_enable_sriov()/pcibios_disable_sriov() weak function
>   similar function is moved to
>   pnv_pci_enable_device_hook()/pnv_pci_disable_device_hook(). When PF is
>   enabled, platform will try best to allocate resources for VFs.
>2. remove pcibios_sriov_resource_size weak function
>3. VF BAR size is retrieved from hardware directly in virtfn_add()
> v4 -> v5:
>1. merge those SRIOV related platform functions in machdep_calls
>   wrap them in one CONFIG_PCI_IOV marco
>2. define IODA_INVALID_M64 to replace (-1)
>   use this value to represent the m64_wins is not used
>3. rename pnv_pci_release_dev_dma() to pnv_pci_ioda2_release_dma_pe()
>   this function is a conterpart to pnv_pci_ioda2_setup_dma_pe()
>4. change dev_info() to dev_dgb() in pnv_pci_ioda_fixup_iov_resources()
>   reduce some log in kernel
>5. release M64 window in pnv_pci_ioda2_release_dma_pe()
> v3 -> v4:
>1. code format fix, eg. not exceed 80 chars
>2. in commit "ppc/pnv: Add function to deconfig a PE"
>   check the bus has a bridge before print the name
>   remove a PE from its own PELTV
>3. change the function name for sriov resource size/alignment
>4. rebase on 3.16-rc3
>5. VFs will n