from:"Jike Song"

Re: Packet gets stuck in NOLOCK pfifo_fast qdisc

2020-08-20 Thread Jike Song

Hi Josh,

On Fri, Jul 3, 2020 at 2:14 AM Josh Hunt  wrote:
{snip}
> Initial results with Cong's patch look promising, so far no stalls. We
> will let it run over the long weekend and report back on Tuesday.
>
> Paolo - I have concerns about possible performance regression with the
> change as well. If you can gather some data that would be great. If
> things look good with our low throughput test over the weekend we can
> also try assessing performance next week.
>

We met possibly the same problem when testing nvidia/mellanox's
GPUDirect RDMA product, we found that changing NET_SCH_DEFAULT to
DEFAULT_FQ_CODEL mitigated the problem, having no idea why. Maybe you
can also have a try?

Besides, our testing is pretty complex, do you have a quick test to
reproduce it?

-- 
Thanks,
Jike

[tip:x86/pti] x86/mm/pti: Remove dead logic in pti_user_pagetable_walk*()

2018-01-08 Thread tip-bot for Jike Song

Commit-ID:  8d56eff266f3e41a6c39926269c4c3f58f881a8e
Gitweb: https://git.kernel.org/tip/8d56eff266f3e41a6c39926269c4c3f58f881a8e
Author: Jike Song <albca...@gmail.com>
AuthorDate: Tue, 9 Jan 2018 00:03:41 +0800
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Mon, 8 Jan 2018 17:42:13 +0100

x86/mm/pti: Remove dead logic in pti_user_pagetable_walk*()

The following code contains dead logic:

 162 if (pgd_none(*pgd)) {
 163 unsigned long new_p4d_page = __get_free_page(gfp);
 164 if (!new_p4d_page)
 165 return NULL;
 166
 167 if (pgd_none(*pgd)) {
 168 set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
 169 new_p4d_page = 0;
 170 }
 171 if (new_p4d_page)
 172 free_page(new_p4d_page);
 173 }

There can't be any difference between two pgd_none(*pgd) at L162 and L167,
so it's always false at L171.

Dave Hansen explained:

 Yes, the double-test was part of an optimization where we attempted to
 avoid using a global spinlock in the fork() path.  We would check for
 unallocated mid-level page tables without the lock.  The lock was only
 taken when we needed to *make* an entry to avoid collisions.
 
 Now that it is all single-threaded, there is no chance of a collision,
 no need for a lock, and no need for the re-check.

As all these functions are only called during init, mark them __init as
well.

Fixes: 03f4424f348e ("x86/mm/pti: Add functions to clone kernel PMDs")
Signed-off-by: Jike Song <albca...@gmail.com>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Cc: Alan Cox <gno...@lxorguk.ukuu.org.uk>
Cc: Andi Kleen <a...@linux.intel.com>
Cc: Tom Lendacky <thomas.lenda...@amd.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Tim Chen <tim.c.c...@linux.intel.com>
Cc: Jiri Koshina <ji...@kernel.org>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Kees Cook <keesc...@google.com>
Cc: Andi Lutomirski <l...@amacapital.net>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Greg KH <gre...@linux-foundation.org>
Cc: David Woodhouse <d...@amazon.co.uk>
Cc: Paul Turner <p...@google.com>
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20180108160341.3461-1-albca...@gmail.com

---
 arch/x86/mm/pti.c | 32 ++--
 1 file changed, 6 insertions(+), 26 deletions(-)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 43d4a4a..ce38f16 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -149,7 +149,7 @@ pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
  *
  * Returns a pointer to a P4D on success, or NULL on failure.
  */
-static p4d_t *pti_user_pagetable_walk_p4d(unsigned long address)
+static __init p4d_t *pti_user_pagetable_walk_p4d(unsigned long address)
 {
pgd_t *pgd = kernel_to_user_pgdp(pgd_offset_k(address));
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
@@ -164,12 +164,7 @@ static p4d_t *pti_user_pagetable_walk_p4d(unsigned long 
address)
if (!new_p4d_page)
return NULL;
 
-   if (pgd_none(*pgd)) {
-   set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
-   new_p4d_page = 0;
-   }
-   if (new_p4d_page)
-   free_page(new_p4d_page);
+   set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
}
BUILD_BUG_ON(pgd_large(*pgd) != 0);
 
@@ -182,7 +177,7 @@ static p4d_t *pti_user_pagetable_walk_p4d(unsigned long 
address)
  *
  * Returns a pointer to a PMD on success, or NULL on failure.
  */
-static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
+static __init pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
 {
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
p4d_t *p4d = pti_user_pagetable_walk_p4d(address);
@@ -194,12 +189,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long 
address)
if (!new_pud_page)
return NULL;
 
-   if (p4d_none(*p4d)) {
-   set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
-   new_pud_page = 0;
-   }
-   if (new_pud_page)
-   free_page(new_pud_page);
+   set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
}
 
pud = pud_offset(p4d, address);
@@ -213,12 +203,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long 
address)
if (!new_pmd_page)
return NULL;
 
-   if (pud_none(*pud)) {
-   set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
-   new_pmd_page = 0;
-   }
-   if (new_pmd_page)
-   free_page(new_pmd_page);
+

[tip:x86/pti] x86/mm/pti: Remove dead logic in pti_user_pagetable_walk*()

2018-01-08 Thread tip-bot for Jike Song

Commit-ID:  8d56eff266f3e41a6c39926269c4c3f58f881a8e
Gitweb: https://git.kernel.org/tip/8d56eff266f3e41a6c39926269c4c3f58f881a8e
Author: Jike Song 
AuthorDate: Tue, 9 Jan 2018 00:03:41 +0800
Committer:  Thomas Gleixner 
CommitDate: Mon, 8 Jan 2018 17:42:13 +0100

x86/mm/pti: Remove dead logic in pti_user_pagetable_walk*()

The following code contains dead logic:

 162 if (pgd_none(*pgd)) {
 163 unsigned long new_p4d_page = __get_free_page(gfp);
 164 if (!new_p4d_page)
 165 return NULL;
 166
 167 if (pgd_none(*pgd)) {
 168 set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
 169 new_p4d_page = 0;
 170 }
 171 if (new_p4d_page)
 172 free_page(new_p4d_page);
 173 }

There can't be any difference between two pgd_none(*pgd) at L162 and L167,
so it's always false at L171.

Dave Hansen explained:

 Yes, the double-test was part of an optimization where we attempted to
 avoid using a global spinlock in the fork() path.  We would check for
 unallocated mid-level page tables without the lock.  The lock was only
 taken when we needed to *make* an entry to avoid collisions.
 
 Now that it is all single-threaded, there is no chance of a collision,
 no need for a lock, and no need for the re-check.

As all these functions are only called during init, mark them __init as
well.

Fixes: 03f4424f348e ("x86/mm/pti: Add functions to clone kernel PMDs")
Signed-off-by: Jike Song 
Signed-off-by: Thomas Gleixner 
Cc: Alan Cox 
Cc: Andi Kleen 
Cc: Tom Lendacky 
Cc: Peter Zijlstra 
Cc: Tim Chen 
Cc: Jiri Koshina 
Cc: Dave Hansen 
Cc: Borislav Petkov 
Cc: Kees Cook 
Cc: Andi Lutomirski 
Cc: Linus Torvalds 
Cc: Greg KH 
Cc: David Woodhouse 
Cc: Paul Turner 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20180108160341.3461-1-albca...@gmail.com

---
 arch/x86/mm/pti.c | 32 ++--
 1 file changed, 6 insertions(+), 26 deletions(-)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 43d4a4a..ce38f16 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -149,7 +149,7 @@ pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
  *
  * Returns a pointer to a P4D on success, or NULL on failure.
  */
-static p4d_t *pti_user_pagetable_walk_p4d(unsigned long address)
+static __init p4d_t *pti_user_pagetable_walk_p4d(unsigned long address)
 {
pgd_t *pgd = kernel_to_user_pgdp(pgd_offset_k(address));
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
@@ -164,12 +164,7 @@ static p4d_t *pti_user_pagetable_walk_p4d(unsigned long 
address)
if (!new_p4d_page)
return NULL;
 
-   if (pgd_none(*pgd)) {
-   set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
-   new_p4d_page = 0;
-   }
-   if (new_p4d_page)
-   free_page(new_p4d_page);
+   set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
}
BUILD_BUG_ON(pgd_large(*pgd) != 0);
 
@@ -182,7 +177,7 @@ static p4d_t *pti_user_pagetable_walk_p4d(unsigned long 
address)
  *
  * Returns a pointer to a PMD on success, or NULL on failure.
  */
-static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
+static __init pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
 {
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
p4d_t *p4d = pti_user_pagetable_walk_p4d(address);
@@ -194,12 +189,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long 
address)
if (!new_pud_page)
return NULL;
 
-   if (p4d_none(*p4d)) {
-   set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
-   new_pud_page = 0;
-   }
-   if (new_pud_page)
-   free_page(new_pud_page);
+   set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
}
 
pud = pud_offset(p4d, address);
@@ -213,12 +203,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long 
address)
if (!new_pmd_page)
return NULL;
 
-   if (pud_none(*pud)) {
-   set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
-   new_pmd_page = 0;
-   }
-   if (new_pmd_page)
-   free_page(new_pmd_page);
+   set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
}
 
return pmd_offset(pud, address);
@@ -251,12 +236,7 @@ static __init pte_t *pti_user_pagetable_walk_pte(unsigned 
long address)
if (!new_pte_page)
return NULL;
 
-   if (pmd_none(*pmd)) {
-   set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
-   new_pte_page = 0;
-   }
-   if (ne

[PATCH v3] x86/mm/pti: remove dead logic during user pagetable population

2018-01-08 Thread Jike Song

Look at one of the code snippets:

 162 if (pgd_none(*pgd)) {
 163 unsigned long new_p4d_page = __get_free_page(gfp);
 164 if (!new_p4d_page)
 165 return NULL;
 166
 167 if (pgd_none(*pgd)) {
 168 set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
 169 new_p4d_page = 0;
 170 }
 171 if (new_p4d_page)
 172 free_page(new_p4d_page);
 173 }

There can't be any difference between two pgd_none(*pgd) at L162 and L167,
so it's always false at L171.

To quote Dave Hansen:

 > Yes, the double-test was part of an optimization where we attempted to
 > avoid using a global spinlock in the fork() path.  We would check for
 > unallocated mid-level page tables without the lock.  The lock was only
 > taken it when we needed to *make* an entry to avoid collisions.
 >
 > Now that it is all single-threaded, there is no chance of a collision,
 > no need for a lock, and no need for the re-check.

v3: mark functions __init; add first-hand explanation
v2: add commit message.

Signed-off-by: Jike Song <albca...@gmail.com>
Cc: David Woodhouse <d...@amazon.co.uk>
Cc: Alan Cox <gno...@lxorguk.ukuu.org.uk>
Cc: Jiri Koshina <ji...@kernel.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Tim Chen <tim.c.c...@linux.intel.com>
Cc: Andi Lutomirski  <l...@amacapital.net>
Cc: Andi Kleen <a...@linux.intel.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Paul Turner <p...@google.com>
Cc: Tom Lendacky <thomas.lenda...@amd.com>
Cc: Greg KH <gre...@linux-foundation.org>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Kees Cook <keesc...@google.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: sta...@vger.kernel.org
---
 arch/x86/mm/pti.c | 32 ++--
 1 file changed, 6 insertions(+), 26 deletions(-)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 43d4a4a29037..ce38f165489b 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -149,7 +149,7 @@ pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
  *
  * Returns a pointer to a P4D on success, or NULL on failure.
  */
-static p4d_t *pti_user_pagetable_walk_p4d(unsigned long address)
+static __init p4d_t *pti_user_pagetable_walk_p4d(unsigned long address)
 {
pgd_t *pgd = kernel_to_user_pgdp(pgd_offset_k(address));
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
@@ -164,12 +164,7 @@ static p4d_t *pti_user_pagetable_walk_p4d(unsigned long 
address)
if (!new_p4d_page)
return NULL;
 
-   if (pgd_none(*pgd)) {
-   set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
-   new_p4d_page = 0;
-   }
-   if (new_p4d_page)
-   free_page(new_p4d_page);
+   set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
}
BUILD_BUG_ON(pgd_large(*pgd) != 0);
 
@@ -182,7 +177,7 @@ static p4d_t *pti_user_pagetable_walk_p4d(unsigned long 
address)
  *
  * Returns a pointer to a PMD on success, or NULL on failure.
  */
-static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
+static __init pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
 {
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
p4d_t *p4d = pti_user_pagetable_walk_p4d(address);
@@ -194,12 +189,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long 
address)
if (!new_pud_page)
return NULL;
 
-   if (p4d_none(*p4d)) {
-   set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
-   new_pud_page = 0;
-   }
-   if (new_pud_page)
-   free_page(new_pud_page);
+   set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
}
 
pud = pud_offset(p4d, address);
@@ -213,12 +203,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long 
address)
if (!new_pmd_page)
return NULL;
 
-   if (pud_none(*pud)) {
-   set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
-   new_pmd_page = 0;
-   }
-   if (new_pmd_page)
-   free_page(new_pmd_page);
+   set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
}
 
return pmd_offset(pud, address);
@@ -251,12 +236,7 @@ static __init pte_t *pti_user_pagetable_walk_pte(unsigned 
long address)
if (!new_pte_page)
return NULL;
 
-   if (pmd_none(*pmd)) {
-   set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
-   new_pte_page = 0;
-   }
-   if (new_pte_page)
-   free_page(new_pte_page);
+

[PATCH v3] x86/mm/pti: remove dead logic during user pagetable population

2018-01-08 Thread Jike Song

Look at one of the code snippets:

 162 if (pgd_none(*pgd)) {
 163 unsigned long new_p4d_page = __get_free_page(gfp);
 164 if (!new_p4d_page)
 165 return NULL;
 166
 167 if (pgd_none(*pgd)) {
 168 set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
 169 new_p4d_page = 0;
 170 }
 171 if (new_p4d_page)
 172 free_page(new_p4d_page);
 173 }

There can't be any difference between two pgd_none(*pgd) at L162 and L167,
so it's always false at L171.

To quote Dave Hansen:

 > Yes, the double-test was part of an optimization where we attempted to
 > avoid using a global spinlock in the fork() path.  We would check for
 > unallocated mid-level page tables without the lock.  The lock was only
 > taken it when we needed to *make* an entry to avoid collisions.
 >
 > Now that it is all single-threaded, there is no chance of a collision,
 > no need for a lock, and no need for the re-check.

v3: mark functions __init; add first-hand explanation
v2: add commit message.

Signed-off-by: Jike Song 
Cc: David Woodhouse 
Cc: Alan Cox 
Cc: Jiri Koshina 
Cc: Linus Torvalds 
Cc: Tim Chen 
Cc: Andi Lutomirski  
Cc: Andi Kleen 
Cc: Peter Zijlstra 
Cc: Paul Turner 
Cc: Tom Lendacky 
Cc: Greg KH 
Cc: Dave Hansen 
Cc: Kees Cook 
Cc: Borislav Petkov 
Cc: sta...@vger.kernel.org
---
 arch/x86/mm/pti.c | 32 ++--
 1 file changed, 6 insertions(+), 26 deletions(-)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 43d4a4a29037..ce38f165489b 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -149,7 +149,7 @@ pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
  *
  * Returns a pointer to a P4D on success, or NULL on failure.
  */
-static p4d_t *pti_user_pagetable_walk_p4d(unsigned long address)
+static __init p4d_t *pti_user_pagetable_walk_p4d(unsigned long address)
 {
pgd_t *pgd = kernel_to_user_pgdp(pgd_offset_k(address));
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
@@ -164,12 +164,7 @@ static p4d_t *pti_user_pagetable_walk_p4d(unsigned long 
address)
if (!new_p4d_page)
return NULL;
 
-   if (pgd_none(*pgd)) {
-   set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
-   new_p4d_page = 0;
-   }
-   if (new_p4d_page)
-   free_page(new_p4d_page);
+   set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
}
BUILD_BUG_ON(pgd_large(*pgd) != 0);
 
@@ -182,7 +177,7 @@ static p4d_t *pti_user_pagetable_walk_p4d(unsigned long 
address)
  *
  * Returns a pointer to a PMD on success, or NULL on failure.
  */
-static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
+static __init pmd_t *pti_user_pagetable_walk_pmd(unsigned long address)
 {
gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO);
p4d_t *p4d = pti_user_pagetable_walk_p4d(address);
@@ -194,12 +189,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long 
address)
if (!new_pud_page)
return NULL;
 
-   if (p4d_none(*p4d)) {
-   set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
-   new_pud_page = 0;
-   }
-   if (new_pud_page)
-   free_page(new_pud_page);
+   set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
}
 
pud = pud_offset(p4d, address);
@@ -213,12 +203,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long 
address)
if (!new_pmd_page)
return NULL;
 
-   if (pud_none(*pud)) {
-   set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
-   new_pmd_page = 0;
-   }
-   if (new_pmd_page)
-   free_page(new_pmd_page);
+   set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
}
 
return pmd_offset(pud, address);
@@ -251,12 +236,7 @@ static __init pte_t *pti_user_pagetable_walk_pte(unsigned 
long address)
if (!new_pte_page)
return NULL;
 
-   if (pmd_none(*pmd)) {
-   set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
-   new_pte_page = 0;
-   }
-   if (new_pte_page)
-   free_page(new_pte_page);
+   set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
}
 
pte = pte_offset_kernel(pmd, address);
-- 
2.14.3

Re: [PATCH] x86/mm/pti: remove dead logic during user pagetable population

2018-01-07 Thread Jike Song

On Sun, Jan 7, 2018 at 5:48 PM, Thomas Gleixner <t...@linutronix.de> wrote:
> On Sun, 7 Jan 2018, Jike Song wrote:
>> On Sun, Jan 7, 2018 at 3:33 AM, Thomas Gleixner <t...@linutronix.de> wrote:
>> > On Sun, 7 Jan 2018, Jike Song wrote:
>> >
>> > Care to explain why you think this is not needed?
>> >
>>
>> Hi Thomas,
>>
>> Look at one of the original code snippets:
>>
>> 162 if (pgd_none(*pgd)) {
>> 163 unsigned long new_p4d_page = __get_free_page(gfp);
>> 164 if (!new_p4d_page)
>> 165 return NULL;
>> 166
>> 167 if (pgd_none(*pgd)) {
>> 168 set_pgd(pgd, __pgd(_KERNPG_TABLE |
>> __pa(new_p4d_page)));
>> 169 new_p4d_page = 0;
>> 170 }
>> 171 if (new_p4d_page)
>> 172 free_page(new_p4d_page);
>> 173 }
>>
>> Correct me if I'm too dumb to see the rationale here, but to me there
>> can't be any difference between
>> two pgd_none(*pgd) of L162 and L167, so it is always false in L171.
>
> Right, but this kind of explanation wants to be in the changelog. Empty
> changelogs for this kind of change are just not acceptable.
>

Roger that, just sent v2 out :)

I'm not quite sure but I CCed sta...@kernel.org anyway.


> Thanks,
>
> tglx


-- 
Thanks,
Jike

Re: [PATCH] x86/mm/pti: remove dead logic during user pagetable population

2018-01-07 Thread Jike Song

On Sun, Jan 7, 2018 at 5:48 PM, Thomas Gleixner  wrote:
> On Sun, 7 Jan 2018, Jike Song wrote:
>> On Sun, Jan 7, 2018 at 3:33 AM, Thomas Gleixner  wrote:
>> > On Sun, 7 Jan 2018, Jike Song wrote:
>> >
>> > Care to explain why you think this is not needed?
>> >
>>
>> Hi Thomas,
>>
>> Look at one of the original code snippets:
>>
>> 162 if (pgd_none(*pgd)) {
>> 163 unsigned long new_p4d_page = __get_free_page(gfp);
>> 164 if (!new_p4d_page)
>> 165 return NULL;
>> 166
>> 167 if (pgd_none(*pgd)) {
>> 168 set_pgd(pgd, __pgd(_KERNPG_TABLE |
>> __pa(new_p4d_page)));
>> 169 new_p4d_page = 0;
>> 170 }
>> 171 if (new_p4d_page)
>> 172 free_page(new_p4d_page);
>> 173 }
>>
>> Correct me if I'm too dumb to see the rationale here, but to me there
>> can't be any difference between
>> two pgd_none(*pgd) of L162 and L167, so it is always false in L171.
>
> Right, but this kind of explanation wants to be in the changelog. Empty
> changelogs for this kind of change are just not acceptable.
>

Roger that, just sent v2 out :)

I'm not quite sure but I CCed sta...@kernel.org anyway.


> Thanks,
>
> tglx


-- 
Thanks,
Jike

[PATCH v2] x86/mm/pti: remove dead logic during user pagetable population

2018-01-07 Thread Jike Song

Look at one of the code snippets:

162 if (pgd_none(*pgd)) {
163 unsigned long new_p4d_page = __get_free_page(gfp);
164 if (!new_p4d_page)
165 return NULL;
166
167 if (pgd_none(*pgd)) {
168 set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
169 new_p4d_page = 0;
170 }
171 if (new_p4d_page)
172 free_page(new_p4d_page);
173 }

There can't be any difference between two pgd_none(*pgd) at L162 and L167,
so it's always false at L171.

v2: add the commit message above.

Signed-off-by: Jike Song <albca...@gmail.com>
Cc: David Woodhouse <d...@amazon.co.uk>
Cc: Alan Cox <gno...@lxorguk.ukuu.org.uk>
Cc: Jiri Koshina <ji...@kernel.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Tim Chen <tim.c.c...@linux.intel.com>
Cc: Andi Lutomirski  <l...@amacapital.net>
Cc: Andi Kleen <a...@linux.intel.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Paul Turner <p...@google.com>
Cc: Tom Lendacky <thomas.lenda...@amd.com>
Cc: Greg KH <gre...@linux-foundation.org>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Kees Cook <keesc...@google.com>
Cc: sta...@vger.kernel.org
Signed-off-by: Jike Song <albca...@gmail.com>
---
 arch/x86/mm/pti.c | 28 
 1 file changed, 4 insertions(+), 24 deletions(-)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 43d4a4a29037..dc611d039bd5 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -164,12 +164,7 @@ static p4d_t *pti_user_pagetable_walk_p4d(unsigned long 
address)
if (!new_p4d_page)
return NULL;
 
-   if (pgd_none(*pgd)) {
-   set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
-   new_p4d_page = 0;
-   }
-   if (new_p4d_page)
-   free_page(new_p4d_page);
+   set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
}
BUILD_BUG_ON(pgd_large(*pgd) != 0);
 
@@ -194,12 +189,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long 
address)
if (!new_pud_page)
return NULL;
 
-   if (p4d_none(*p4d)) {
-   set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
-   new_pud_page = 0;
-   }
-   if (new_pud_page)
-   free_page(new_pud_page);
+   set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
}
 
pud = pud_offset(p4d, address);
@@ -213,12 +203,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long 
address)
if (!new_pmd_page)
return NULL;
 
-   if (pud_none(*pud)) {
-   set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
-   new_pmd_page = 0;
-   }
-   if (new_pmd_page)
-   free_page(new_pmd_page);
+   set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
}
 
return pmd_offset(pud, address);
@@ -251,12 +236,7 @@ static __init pte_t *pti_user_pagetable_walk_pte(unsigned 
long address)
if (!new_pte_page)
return NULL;
 
-   if (pmd_none(*pmd)) {
-   set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
-   new_pte_page = 0;
-   }
-   if (new_pte_page)
-   free_page(new_pte_page);
+   set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
}
 
pte = pte_offset_kernel(pmd, address);
-- 
2.14.3

[PATCH v2] x86/mm/pti: remove dead logic during user pagetable population

2018-01-07 Thread Jike Song

Look at one of the code snippets:

162 if (pgd_none(*pgd)) {
163 unsigned long new_p4d_page = __get_free_page(gfp);
164 if (!new_p4d_page)
165 return NULL;
166
167 if (pgd_none(*pgd)) {
168 set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
169 new_p4d_page = 0;
170 }
171 if (new_p4d_page)
172 free_page(new_p4d_page);
173 }

There can't be any difference between two pgd_none(*pgd) at L162 and L167,
so it's always false at L171.

v2: add the commit message above.

Signed-off-by: Jike Song 
Cc: David Woodhouse 
Cc: Alan Cox 
Cc: Jiri Koshina 
Cc: Linus Torvalds 
Cc: Tim Chen 
Cc: Andi Lutomirski  
Cc: Andi Kleen 
Cc: Peter Zijlstra 
Cc: Paul Turner 
Cc: Tom Lendacky 
Cc: Greg KH 
Cc: Dave Hansen 
Cc: Kees Cook 
Cc: sta...@vger.kernel.org
Signed-off-by: Jike Song 
---
 arch/x86/mm/pti.c | 28 
 1 file changed, 4 insertions(+), 24 deletions(-)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 43d4a4a29037..dc611d039bd5 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -164,12 +164,7 @@ static p4d_t *pti_user_pagetable_walk_p4d(unsigned long 
address)
if (!new_p4d_page)
return NULL;
 
-   if (pgd_none(*pgd)) {
-   set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
-   new_p4d_page = 0;
-   }
-   if (new_p4d_page)
-   free_page(new_p4d_page);
+   set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
}
BUILD_BUG_ON(pgd_large(*pgd) != 0);
 
@@ -194,12 +189,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long 
address)
if (!new_pud_page)
return NULL;
 
-   if (p4d_none(*p4d)) {
-   set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
-   new_pud_page = 0;
-   }
-   if (new_pud_page)
-   free_page(new_pud_page);
+   set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
}
 
pud = pud_offset(p4d, address);
@@ -213,12 +203,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long 
address)
if (!new_pmd_page)
return NULL;
 
-   if (pud_none(*pud)) {
-   set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
-   new_pmd_page = 0;
-   }
-   if (new_pmd_page)
-   free_page(new_pmd_page);
+   set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
}
 
return pmd_offset(pud, address);
@@ -251,12 +236,7 @@ static __init pte_t *pti_user_pagetable_walk_pte(unsigned 
long address)
if (!new_pte_page)
return NULL;
 
-   if (pmd_none(*pmd)) {
-   set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
-   new_pte_page = 0;
-   }
-   if (new_pte_page)
-   free_page(new_pte_page);
+   set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
}
 
pte = pte_offset_kernel(pmd, address);
-- 
2.14.3

Re: [PATCH] x86/mm/pti: remove dead logic during user pagetable population

2018-01-06 Thread Jike Song

On Sun, Jan 7, 2018 at 4:03 AM, Willy Tarreau <w...@1wt.eu> wrote:
> On Sun, Jan 07, 2018 at 01:50:59AM +0800, Jike Song wrote:
>> Signed-off-by: Jike Song <albca...@gmail.com>
>
> It would be nice to have a commit message, particularly in this quite
> sensitive series...

Yes that's useful, will add it in v2 :)

>
> Willy



-- 
Thanks,
Jike

Re: [PATCH] x86/mm/pti: remove dead logic during user pagetable population

2018-01-06 Thread Jike Song

On Sun, Jan 7, 2018 at 4:03 AM, Willy Tarreau  wrote:
> On Sun, Jan 07, 2018 at 01:50:59AM +0800, Jike Song wrote:
>> Signed-off-by: Jike Song 
>
> It would be nice to have a commit message, particularly in this quite
> sensitive series...

Yes that's useful, will add it in v2 :)

>
> Willy



-- 
Thanks,
Jike

Re: [PATCH] x86/mm/pti: remove dead logic during user pagetable population

2018-01-06 Thread Jike Song

On Sun, Jan 7, 2018 at 3:33 AM, Thomas Gleixner <t...@linutronix.de> wrote:
> On Sun, 7 Jan 2018, Jike Song wrote:
>
> Care to explain why you think this is not needed?
>

Hi Thomas,

Look at one of the original code snippets:

162 if (pgd_none(*pgd)) {
163 unsigned long new_p4d_page = __get_free_page(gfp);
164 if (!new_p4d_page)
165 return NULL;
166
167 if (pgd_none(*pgd)) {
168 set_pgd(pgd, __pgd(_KERNPG_TABLE |
__pa(new_p4d_page)));
169 new_p4d_page = 0;
170 }
171 if (new_p4d_page)
172 free_page(new_p4d_page);
173 }

Correct me if I'm too dumb to see the rationale here, but to me there
can't be any difference between
two pgd_none(*pgd) of L162 and L167, so it is always false in L171.

> Thanks,
>
> tglx

-- 
Thanks,
Jike

Re: [PATCH] x86/mm/pti: remove dead logic during user pagetable population

2018-01-06 Thread Jike Song

On Sun, Jan 7, 2018 at 3:33 AM, Thomas Gleixner  wrote:
> On Sun, 7 Jan 2018, Jike Song wrote:
>
> Care to explain why you think this is not needed?
>

Hi Thomas,

Look at one of the original code snippets:

162 if (pgd_none(*pgd)) {
163 unsigned long new_p4d_page = __get_free_page(gfp);
164 if (!new_p4d_page)
165 return NULL;
166
167 if (pgd_none(*pgd)) {
168 set_pgd(pgd, __pgd(_KERNPG_TABLE |
__pa(new_p4d_page)));
169 new_p4d_page = 0;
170 }
171 if (new_p4d_page)
172 free_page(new_p4d_page);
173 }

Correct me if I'm too dumb to see the rationale here, but to me there
can't be any difference between
two pgd_none(*pgd) of L162 and L167, so it is always false in L171.

> Thanks,
>
> tglx

-- 
Thanks,
Jike

[PATCH] x86/mm/pti: remove dead logic during user pagetable population

2018-01-06 Thread Jike Song

Signed-off-by: Jike Song <albca...@gmail.com>
---
 arch/x86/mm/pti.c | 28 
 1 file changed, 4 insertions(+), 24 deletions(-)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 43d4a4a29037..dc611d039bd5 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -164,12 +164,7 @@ static p4d_t *pti_user_pagetable_walk_p4d(unsigned long 
address)
if (!new_p4d_page)
return NULL;
 
-   if (pgd_none(*pgd)) {
-   set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
-   new_p4d_page = 0;
-   }
-   if (new_p4d_page)
-   free_page(new_p4d_page);
+   set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
}
BUILD_BUG_ON(pgd_large(*pgd) != 0);
 
@@ -194,12 +189,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long 
address)
if (!new_pud_page)
return NULL;
 
-   if (p4d_none(*p4d)) {
-   set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
-   new_pud_page = 0;
-   }
-   if (new_pud_page)
-   free_page(new_pud_page);
+   set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
}
 
pud = pud_offset(p4d, address);
@@ -213,12 +203,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long 
address)
if (!new_pmd_page)
return NULL;
 
-   if (pud_none(*pud)) {
-   set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
-   new_pmd_page = 0;
-   }
-   if (new_pmd_page)
-   free_page(new_pmd_page);
+   set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
}
 
return pmd_offset(pud, address);
@@ -251,12 +236,7 @@ static __init pte_t *pti_user_pagetable_walk_pte(unsigned 
long address)
if (!new_pte_page)
return NULL;
 
-   if (pmd_none(*pmd)) {
-   set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
-   new_pte_page = 0;
-   }
-   if (new_pte_page)
-   free_page(new_pte_page);
+   set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
}
 
pte = pte_offset_kernel(pmd, address);
-- 
2.14.3

[PATCH] x86/mm/pti: remove dead logic during user pagetable population

2018-01-06 Thread Jike Song

Signed-off-by: Jike Song 
---
 arch/x86/mm/pti.c | 28 
 1 file changed, 4 insertions(+), 24 deletions(-)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 43d4a4a29037..dc611d039bd5 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -164,12 +164,7 @@ static p4d_t *pti_user_pagetable_walk_p4d(unsigned long 
address)
if (!new_p4d_page)
return NULL;
 
-   if (pgd_none(*pgd)) {
-   set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
-   new_p4d_page = 0;
-   }
-   if (new_p4d_page)
-   free_page(new_p4d_page);
+   set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page)));
}
BUILD_BUG_ON(pgd_large(*pgd) != 0);
 
@@ -194,12 +189,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long 
address)
if (!new_pud_page)
return NULL;
 
-   if (p4d_none(*p4d)) {
-   set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
-   new_pud_page = 0;
-   }
-   if (new_pud_page)
-   free_page(new_pud_page);
+   set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page)));
}
 
pud = pud_offset(p4d, address);
@@ -213,12 +203,7 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long 
address)
if (!new_pmd_page)
return NULL;
 
-   if (pud_none(*pud)) {
-   set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
-   new_pmd_page = 0;
-   }
-   if (new_pmd_page)
-   free_page(new_pmd_page);
+   set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page)));
}
 
return pmd_offset(pud, address);
@@ -251,12 +236,7 @@ static __init pte_t *pti_user_pagetable_walk_pte(unsigned 
long address)
if (!new_pte_page)
return NULL;
 
-   if (pmd_none(*pmd)) {
-   set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
-   new_pte_page = 0;
-   }
-   if (new_pte_page)
-   free_page(new_pte_page);
+   set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page)));
}
 
pte = pte_offset_kernel(pmd, address);
-- 
2.14.3

Re: [RFC PATCH v5 0/5] vfio-pci: Add support for mmapping MSI-X table

2017-08-14 Thread Jike Song

On 08/14/2017 09:12 PM, Robin Murphy wrote:
> On 14/08/17 10:45, Alexey Kardashevskiy wrote:
>> Folks,
>>
>> Is there anything to change besides those compiler errors and David's
>> comment in 5/5? Or the while patchset is too bad? Thanks.
> 
> While I now understand it's not the low-level thing I first thought it
> was, so my reasoning has changed, personally I don't like this approach
> any more than the previous one - it still smells of abusing external
> APIs to pass information from one part of VFIO to another (and it has
> the same conceptual problem of attributing something to interrupt
> sources that is actually a property of the interrupt target).
> 
> Taking a step back, though, why does vfio-pci perform this check in the
> first place? If a malicious guest already has control of a device, any
> kind of interrupt spoofing it could do by fiddling with the MSI-X
> message address/data it could simply do with a DMA write anyway, so the
> security argument doesn't stand up in general (sure, not all PCIe
> devices may be capable of arbitrary DMA, but that seems like more of a
> tenuous security-by-obscurity angle to me).

Hi Robin,

DMA writes will be translated (thereby censored) by DMA Remapping hardware,
while MSI/MSI-X will not. Is this different for non-x86?

--
Thanks,
Jike

> Besides, with Type1 IOMMU
> the fact that we've let a device be assigned at all means that this is
> already a non-issue (because either the hardware provides isolation or
> the user has explicitly accepted the consequences of an unsafe
> configuration) - from patch #4 that's apparently the same for SPAPR TCE,
> in which case it seems this flag doesn't even need to be propagated and
> could simply be assumed always.
> 
> On the other hand, if the check is not so much to mitigate malicious
> guests attacking the system as to prevent dumb guests breaking
> themselves (e.g. if some or all of the MSI-X capability is actually
> emulated), then allowing things to sometimes go wrong on the grounds of
> an irrelevant hardware feature doesn't seem correct :/
> 
> Robin.
> 
>> On 07/08/17 17:25, Alexey Kardashevskiy wrote:
>>> This is a followup for "[PATCH kernel v4 0/6] vfio-pci: Add support for 
>>> mmapping MSI-X table"
>>> http://www.spinics.net/lists/kvm/msg152232.html
>>>
>>> This time it is using "caps" in IOMMU groups. The main question is if PCI
>>> bus flags or IOMMU domains are still better (and which one).
>>
>>>
>>>
>>>
>>> Here is some background:
>>>
>>> Current vfio-pci implementation disallows to mmap the page
>>> containing MSI-X table in case that users can write directly
>>> to MSI-X table and generate an incorrect MSIs.
>>>
>>> However, this will cause some performance issue when there
>>> are some critical device registers in the same page as the
>>> MSI-X table. We have to handle the mmio access to these
>>> registers in QEMU emulation rather than in guest.
>>>
>>> To solve this issue, this series allows to expose MSI-X table
>>> to userspace when hardware enables the capability of interrupt
>>> remapping which can ensure that a given PCI device can only
>>> shoot the MSIs assigned for it. And we introduce a new bus_flags
>>> PCI_BUS_FLAGS_MSI_REMAP to test this capability on PCI side
>>> for different archs.
>>>
>>>
>>> This is based on sha1
>>> 26c5cebfdb6c "Merge branch 'parisc-4.13-4' of 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux"
>>>
>>> Please comment. Thanks.
>>>
>>> Changelog:
>>>
>>> v5:
>>> * redid the whole thing via so-called IOMMU group capabilities
>>>
>>> v4:
>>> * rebased on recent upstream
>>> * got all 6 patches from v2 (v3 was missing some)
>>>
>>>
>>>
>>>
>>> Alexey Kardashevskiy (5):
>>>   iommu: Add capabilities to a group
>>>   iommu: Set IOMMU_GROUP_CAP_ISOLATE_MSIX if MSI controller enables IRQ
>>> remapping
>>>   iommu/intel/amd: Set IOMMU_GROUP_CAP_ISOLATE_MSIX if IRQ remapping is
>>> enabled
>>>   powerpc/iommu: Set IOMMU_GROUP_CAP_ISOLATE_MSIX
>>>   vfio-pci: Allow to expose MSI-X table to userspace when safe
>>>
>>>  include/linux/iommu.h| 20 
>>>  include/linux/vfio.h |  1 +
>>>  arch/powerpc/kernel/iommu.c  |  1 +
>>>  drivers/iommu/amd_iommu.c|  3 +++
>>>  drivers/iommu/intel-iommu.c  |  3 +++
>>>  drivers/iommu/iommu.c| 35 +++
>>>  drivers/vfio/pci/vfio_pci.c  | 20 +---
>>>  drivers/vfio/pci/vfio_pci_rdwr.c |  5 -
>>>  drivers/vfio/vfio.c  | 15 +++
>>>  9 files changed, 99 insertions(+), 4 deletions(-)
>>>
>>
>>
>

Re: [RFC PATCH v5 0/5] vfio-pci: Add support for mmapping MSI-X table

2017-08-14 Thread Jike Song

On 08/14/2017 09:12 PM, Robin Murphy wrote:
> On 14/08/17 10:45, Alexey Kardashevskiy wrote:
>> Folks,
>>
>> Is there anything to change besides those compiler errors and David's
>> comment in 5/5? Or the while patchset is too bad? Thanks.
> 
> While I now understand it's not the low-level thing I first thought it
> was, so my reasoning has changed, personally I don't like this approach
> any more than the previous one - it still smells of abusing external
> APIs to pass information from one part of VFIO to another (and it has
> the same conceptual problem of attributing something to interrupt
> sources that is actually a property of the interrupt target).
> 
> Taking a step back, though, why does vfio-pci perform this check in the
> first place? If a malicious guest already has control of a device, any
> kind of interrupt spoofing it could do by fiddling with the MSI-X
> message address/data it could simply do with a DMA write anyway, so the
> security argument doesn't stand up in general (sure, not all PCIe
> devices may be capable of arbitrary DMA, but that seems like more of a
> tenuous security-by-obscurity angle to me).

Hi Robin,

DMA writes will be translated (thereby censored) by DMA Remapping hardware,
while MSI/MSI-X will not. Is this different for non-x86?

--
Thanks,
Jike

> Besides, with Type1 IOMMU
> the fact that we've let a device be assigned at all means that this is
> already a non-issue (because either the hardware provides isolation or
> the user has explicitly accepted the consequences of an unsafe
> configuration) - from patch #4 that's apparently the same for SPAPR TCE,
> in which case it seems this flag doesn't even need to be propagated and
> could simply be assumed always.
> 
> On the other hand, if the check is not so much to mitigate malicious
> guests attacking the system as to prevent dumb guests breaking
> themselves (e.g. if some or all of the MSI-X capability is actually
> emulated), then allowing things to sometimes go wrong on the grounds of
> an irrelevant hardware feature doesn't seem correct :/
> 
> Robin.
> 
>> On 07/08/17 17:25, Alexey Kardashevskiy wrote:
>>> This is a followup for "[PATCH kernel v4 0/6] vfio-pci: Add support for 
>>> mmapping MSI-X table"
>>> http://www.spinics.net/lists/kvm/msg152232.html
>>>
>>> This time it is using "caps" in IOMMU groups. The main question is if PCI
>>> bus flags or IOMMU domains are still better (and which one).
>>
>>>
>>>
>>>
>>> Here is some background:
>>>
>>> Current vfio-pci implementation disallows to mmap the page
>>> containing MSI-X table in case that users can write directly
>>> to MSI-X table and generate an incorrect MSIs.
>>>
>>> However, this will cause some performance issue when there
>>> are some critical device registers in the same page as the
>>> MSI-X table. We have to handle the mmio access to these
>>> registers in QEMU emulation rather than in guest.
>>>
>>> To solve this issue, this series allows to expose MSI-X table
>>> to userspace when hardware enables the capability of interrupt
>>> remapping which can ensure that a given PCI device can only
>>> shoot the MSIs assigned for it. And we introduce a new bus_flags
>>> PCI_BUS_FLAGS_MSI_REMAP to test this capability on PCI side
>>> for different archs.
>>>
>>>
>>> This is based on sha1
>>> 26c5cebfdb6c "Merge branch 'parisc-4.13-4' of 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux"
>>>
>>> Please comment. Thanks.
>>>
>>> Changelog:
>>>
>>> v5:
>>> * redid the whole thing via so-called IOMMU group capabilities
>>>
>>> v4:
>>> * rebased on recent upstream
>>> * got all 6 patches from v2 (v3 was missing some)
>>>
>>>
>>>
>>>
>>> Alexey Kardashevskiy (5):
>>>   iommu: Add capabilities to a group
>>>   iommu: Set IOMMU_GROUP_CAP_ISOLATE_MSIX if MSI controller enables IRQ
>>> remapping
>>>   iommu/intel/amd: Set IOMMU_GROUP_CAP_ISOLATE_MSIX if IRQ remapping is
>>> enabled
>>>   powerpc/iommu: Set IOMMU_GROUP_CAP_ISOLATE_MSIX
>>>   vfio-pci: Allow to expose MSI-X table to userspace when safe
>>>
>>>  include/linux/iommu.h| 20 
>>>  include/linux/vfio.h |  1 +
>>>  arch/powerpc/kernel/iommu.c  |  1 +
>>>  drivers/iommu/amd_iommu.c|  3 +++
>>>  drivers/iommu/intel-iommu.c  |  3 +++
>>>  drivers/iommu/iommu.c| 35 +++
>>>  drivers/vfio/pci/vfio_pci.c  | 20 +---
>>>  drivers/vfio/pci/vfio_pci_rdwr.c |  5 -
>>>  drivers/vfio/vfio.c  | 15 +++
>>>  9 files changed, 99 insertions(+), 4 deletions(-)
>>>
>>
>>
>

Re: [RFC PATCH v5 0/5] vfio-pci: Add support for mmapping MSI-X table

2017-08-14 Thread Jike Song

On 08/15/2017 09:33 AM, Benjamin Herrenschmidt wrote:
> On Tue, 2017-08-15 at 09:16 +0800, Jike Song wrote:
>>> Taking a step back, though, why does vfio-pci perform this check in the
>>> first place? If a malicious guest already has control of a device, any
>>> kind of interrupt spoofing it could do by fiddling with the MSI-X
>>> message address/data it could simply do with a DMA write anyway, so the
>>> security argument doesn't stand up in general (sure, not all PCIe
>>> devices may be capable of arbitrary DMA, but that seems like more of a
>>> tenuous security-by-obscurity angle to me).
> 
> I tried to make that point for years, thanks for re-iterating it :-)
> 
>> Hi Robin,
>>
>> DMA writes will be translated (thereby censored) by DMA Remapping hardware,
>> while MSI/MSI-X will not. Is this different for non-x86?
> 
> There is no way your DMA remapping HW can differenciate. The only
> difference between a DMA write and an MSI is ... the address. So if I
> can make my device DMA to the MSI address range, I've defeated your
> security.

I don't think with IRQ remapping enabled, you can make your device DMA to
MSI address, without being treated as an IRQ and remapped. If so, the IRQ
remapping hardware is simply broken :)

--
Thanks,
Jike

Re: [RFC PATCH v5 0/5] vfio-pci: Add support for mmapping MSI-X table

2017-08-14 Thread Jike Song

On 08/15/2017 09:33 AM, Benjamin Herrenschmidt wrote:
> On Tue, 2017-08-15 at 09:16 +0800, Jike Song wrote:
>>> Taking a step back, though, why does vfio-pci perform this check in the
>>> first place? If a malicious guest already has control of a device, any
>>> kind of interrupt spoofing it could do by fiddling with the MSI-X
>>> message address/data it could simply do with a DMA write anyway, so the
>>> security argument doesn't stand up in general (sure, not all PCIe
>>> devices may be capable of arbitrary DMA, but that seems like more of a
>>> tenuous security-by-obscurity angle to me).
> 
> I tried to make that point for years, thanks for re-iterating it :-)
> 
>> Hi Robin,
>>
>> DMA writes will be translated (thereby censored) by DMA Remapping hardware,
>> while MSI/MSI-X will not. Is this different for non-x86?
> 
> There is no way your DMA remapping HW can differenciate. The only
> difference between a DMA write and an MSI is ... the address. So if I
> can make my device DMA to the MSI address range, I've defeated your
> security.

I don't think with IRQ remapping enabled, you can make your device DMA to
MSI address, without being treated as an IRQ and remapped. If so, the IRQ
remapping hardware is simply broken :)

--
Thanks,
Jike

Re: [PATCH] PCI: Do not enable extended tags on pre-dated (v1.x) systems

2017-07-06 Thread Jike Song

On Wed, Jul 5, 2017 at 9:19 PM, Sinan Kaya  wrote:
> According to extended tags ECN document, all PCIe receivers are expected
> to support extended tags support. It should be safe to enable extended
> tags on endpoints without checking compatibility.
>
> This assumption seems to be working fine except for the legacy systems.
> The ECN has been written against PCIE spec version 2.0. Therefore, we need
> to exclude all version 1.0 devices from this change as there is HW out
> there that can't handle extended tags.
>
> Note that the default value of Extended Tags Enable bit is implementation
> specific. Therefore, we are clearing the bit by default when incompatible
> HW is found without assuming that value is zero.
>
> Reported-by: Wim ten Have 
> Link: 
> https://pcisig.com/sites/default/files/specification_documents/ECN_Extended_Tag_Enable_Default_05Sept2008_final.pdf
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1467674
> Fixes: 60db3a4d8cc9 ("PCI: Enable PCIe Extended Tags if supported")
> Tested-by: Wim ten Have 
> Signed-off-by: Sinan Kaya 
> ---
>  drivers/pci/probe.c | 52 +---
>  1 file changed, 45 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 19c8950..5e39013 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1684,21 +1684,58 @@ static void program_hpp_type2(struct pci_dev *dev, 
> struct hpp_type2 *hpp)
>  */
>  }
>
> -static void pci_configure_extended_tags(struct pci_dev *dev)
> +static bool pcie_bus_exttags_supported(struct pci_bus *bus)
> +{
> +   bool exttags_supported = true;
> +   struct pci_dev *bridge;
> +   int rc;
> +   u16 flags;
> +
> +   bridge = bus->self;
> +   while (bridge) {
> +   if (pci_is_pcie(bridge)) {
> +   rc = pcie_capability_read_word(bridge, PCI_EXP_FLAGS,
> +  );
> +   if (!rc && ((flags & PCI_EXP_FLAGS_VERS) < 2)) {
> +   exttags_supported = false;
> +   break;
> +   }
> +   }
> +   if (!bridge->bus->parent)
> +   break;
> +   bridge = bridge->bus->parent->self;
> +   }
> +
> +   return exttags_supported;
> +}
> +
> +static int pcie_bus_configure_exttags(struct pci_dev *dev, void *data)
>  {
> u32 dev_cap;
> int ret;
> +   bool supported;
>
> if (!pci_is_pcie(dev))
> -   return;
> +   return 0;
>
> ret = pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, _cap);
> if (ret)
> -   return;
> +   return 0;
>
> -   if (dev_cap & PCI_EXP_DEVCAP_EXT_TAG)
> -   pcie_capability_set_word(dev, PCI_EXP_DEVCTL,
> -PCI_EXP_DEVCTL_EXT_TAG);
> +   if (dev_cap & PCI_EXP_DEVCAP_EXT_TAG) {
> +   supported = pcie_bus_exttags_supported(dev->bus);
> +

Maybe checking the version of this endpoint at first? Do you expect a
v1 endpoint
to be working under v2+ ports?

-- 
Thanks,
Jike

Re: [PATCH] PCI: Do not enable extended tags on pre-dated (v1.x) systems

2017-07-06 Thread Jike Song

On Wed, Jul 5, 2017 at 9:19 PM, Sinan Kaya  wrote:
> According to extended tags ECN document, all PCIe receivers are expected
> to support extended tags support. It should be safe to enable extended
> tags on endpoints without checking compatibility.
>
> This assumption seems to be working fine except for the legacy systems.
> The ECN has been written against PCIE spec version 2.0. Therefore, we need
> to exclude all version 1.0 devices from this change as there is HW out
> there that can't handle extended tags.
>
> Note that the default value of Extended Tags Enable bit is implementation
> specific. Therefore, we are clearing the bit by default when incompatible
> HW is found without assuming that value is zero.
>
> Reported-by: Wim ten Have 
> Link: 
> https://pcisig.com/sites/default/files/specification_documents/ECN_Extended_Tag_Enable_Default_05Sept2008_final.pdf
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=1467674
> Fixes: 60db3a4d8cc9 ("PCI: Enable PCIe Extended Tags if supported")
> Tested-by: Wim ten Have 
> Signed-off-by: Sinan Kaya 
> ---
>  drivers/pci/probe.c | 52 +---
>  1 file changed, 45 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 19c8950..5e39013 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1684,21 +1684,58 @@ static void program_hpp_type2(struct pci_dev *dev, 
> struct hpp_type2 *hpp)
>  */
>  }
>
> -static void pci_configure_extended_tags(struct pci_dev *dev)
> +static bool pcie_bus_exttags_supported(struct pci_bus *bus)
> +{
> +   bool exttags_supported = true;
> +   struct pci_dev *bridge;
> +   int rc;
> +   u16 flags;
> +
> +   bridge = bus->self;
> +   while (bridge) {
> +   if (pci_is_pcie(bridge)) {
> +   rc = pcie_capability_read_word(bridge, PCI_EXP_FLAGS,
> +  );
> +   if (!rc && ((flags & PCI_EXP_FLAGS_VERS) < 2)) {
> +   exttags_supported = false;
> +   break;
> +   }
> +   }
> +   if (!bridge->bus->parent)
> +   break;
> +   bridge = bridge->bus->parent->self;
> +   }
> +
> +   return exttags_supported;
> +}
> +
> +static int pcie_bus_configure_exttags(struct pci_dev *dev, void *data)
>  {
> u32 dev_cap;
> int ret;
> +   bool supported;
>
> if (!pci_is_pcie(dev))
> -   return;
> +   return 0;
>
> ret = pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, _cap);
> if (ret)
> -   return;
> +   return 0;
>
> -   if (dev_cap & PCI_EXP_DEVCAP_EXT_TAG)
> -   pcie_capability_set_word(dev, PCI_EXP_DEVCTL,
> -PCI_EXP_DEVCTL_EXT_TAG);
> +   if (dev_cap & PCI_EXP_DEVCAP_EXT_TAG) {
> +   supported = pcie_bus_exttags_supported(dev->bus);
> +

Maybe checking the version of this endpoint at first? Do you expect a
v1 endpoint
to be working under v2+ ports?

-- 
Thanks,
Jike

Re: [PATCH] kvmgt: Hold struct kvm reference

2017-03-20 Thread Jike Song

On 03/20/2017 10:38 AM, Alex Williamson wrote:
> The kvmgt code keeps a pointer to the struct kvm associated with the
> device, but doesn't actually hold a reference to it.  If we do unclean
> shutdown testing (ie. killing the user process), then we can see the
> kvm association to the device unset, which causes kvmgt to trigger a
> device release via a work queue.  Naturally we cannot guarantee that
> the cached struct kvm pointer is still valid at this point without
> holding a reference.  The observed failure in this case is a stuck
> cpu trying to acquire the spinlock from the invalid reference, but
> other failure modes are clearly possible.  Hold a reference to avoid
> this.
> 
> Signed-off-by: Alex Williamson <alex.william...@redhat.com>
> Cc: sta...@vger.kernel.org #v4.10
> Cc: Jike Song <jike.s...@intel.com>
> Cc: Paolo Bonzini <pbonz...@redhat.com>
> Cc: Zhenyu Wang <zhen...@linux.intel.com>
> Cc: Zhi Wang <zhi.a.w...@intel.com>
> ---

Reviewed-by: Jike Song <jike.s...@intel.com>

Thanks for the fix!

--
Thanks,
Jike

>  drivers/gpu/drm/i915/gvt/kvmgt.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 84d801638ede..142b8bd4ba6b 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -1324,6 +1324,7 @@ static int kvmgt_guest_init(struct mdev_device *mdev)
>   vgpu->handle = (unsigned long)info;
>   info->vgpu = vgpu;
>   info->kvm = kvm;
> + kvm_get_kvm(info->kvm);
>  
>   kvmgt_protect_table_init(info);
>   gvt_cache_init(vgpu);
> @@ -1343,6 +1344,7 @@ static bool kvmgt_guest_exit(struct kvmgt_guest_info 
> *info)
>   }
>  
>   kvm_page_track_unregister_notifier(info->kvm, >track_node);
> + kvm_put_kvm(info->kvm);
>   kvmgt_protect_table_destroy(info);
>   gvt_cache_destroy(info->vgpu);
>   vfree(info);
>

Re: [PATCH] kvmgt: Hold struct kvm reference

2017-03-20 Thread Jike Song

On 03/20/2017 10:38 AM, Alex Williamson wrote:
> The kvmgt code keeps a pointer to the struct kvm associated with the
> device, but doesn't actually hold a reference to it.  If we do unclean
> shutdown testing (ie. killing the user process), then we can see the
> kvm association to the device unset, which causes kvmgt to trigger a
> device release via a work queue.  Naturally we cannot guarantee that
> the cached struct kvm pointer is still valid at this point without
> holding a reference.  The observed failure in this case is a stuck
> cpu trying to acquire the spinlock from the invalid reference, but
> other failure modes are clearly possible.  Hold a reference to avoid
> this.
> 
> Signed-off-by: Alex Williamson 
> Cc: sta...@vger.kernel.org #v4.10
> Cc: Jike Song 
> Cc: Paolo Bonzini 
> Cc: Zhenyu Wang 
> Cc: Zhi Wang 
> ---

Reviewed-by: Jike Song 

Thanks for the fix!

--
Thanks,
Jike

>  drivers/gpu/drm/i915/gvt/kvmgt.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 84d801638ede..142b8bd4ba6b 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -1324,6 +1324,7 @@ static int kvmgt_guest_init(struct mdev_device *mdev)
>   vgpu->handle = (unsigned long)info;
>   info->vgpu = vgpu;
>   info->kvm = kvm;
> + kvm_get_kvm(info->kvm);
>  
>   kvmgt_protect_table_init(info);
>   gvt_cache_init(vgpu);
> @@ -1343,6 +1344,7 @@ static bool kvmgt_guest_exit(struct kvmgt_guest_info 
> *info)
>   }
>  
>   kvm_page_track_unregister_notifier(info->kvm, >track_node);
> + kvm_put_kvm(info->kvm);
>   kvmgt_protect_table_destroy(info);
>   gvt_cache_destroy(info->vgpu);
>   vfree(info);
>

Re: [PATCH] drm/i915/gvt/kvmgt: mdev ABI is available_instances, not available_instance

2017-01-24 Thread Jike Song

On 01/25/2017 03:53 AM, Alex Williamson wrote:
> Per the ABI specification[1], each mdev_supported_types entry should
> have an available_instances, with an "s", not available_instance.
> 
> [1] Documentation/ABI/testing/sysfs-bus-vfio-mdev
> 
> Signed-off-by: Alex Williamson <alex.william...@redhat.com>
> ---
> 
> This should really be fixed before initial release in v4.10

Acked-by: Jike Song <jike.s...@intel.com>

Thanks for finding this!


--
Thanks,
Jike

> 
>  drivers/gpu/drm/i915/gvt/kvmgt.c |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index faaae07..ab1e057 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -230,8 +230,8 @@ static struct intel_vgpu_type 
> *intel_gvt_find_vgpu_type(struct intel_gvt *gvt,
>   return NULL;
>  }
>  
> -static ssize_t available_instance_show(struct kobject *kobj, struct device 
> *dev,
> - char *buf)
> +static ssize_t available_instances_show(struct kobject *kobj,
> + struct device *dev, char *buf)
>  {
>   struct intel_vgpu_type *type;
>   unsigned int num = 0;
> @@ -269,12 +269,12 @@ static ssize_t description_show(struct kobject *kobj, 
> struct device *dev,
>   type->fence);
>  }
>  
> -static MDEV_TYPE_ATTR_RO(available_instance);
> +static MDEV_TYPE_ATTR_RO(available_instances);
>  static MDEV_TYPE_ATTR_RO(device_api);
>  static MDEV_TYPE_ATTR_RO(description);
>  
>  static struct attribute *type_attrs[] = {
> - _type_attr_available_instance.attr,
> + _type_attr_available_instances.attr,
>   _type_attr_device_api.attr,
>   _type_attr_description.attr,
>   NULL,
>

Re: [PATCH] drm/i915/gvt/kvmgt: mdev ABI is available_instances, not available_instance

2017-01-24 Thread Jike Song

On 01/25/2017 03:53 AM, Alex Williamson wrote:
> Per the ABI specification[1], each mdev_supported_types entry should
> have an available_instances, with an "s", not available_instance.
> 
> [1] Documentation/ABI/testing/sysfs-bus-vfio-mdev
> 
> Signed-off-by: Alex Williamson 
> ---
> 
> This should really be fixed before initial release in v4.10

Acked-by: Jike Song 

Thanks for finding this!


--
Thanks,
Jike

> 
>  drivers/gpu/drm/i915/gvt/kvmgt.c |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index faaae07..ab1e057 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -230,8 +230,8 @@ static struct intel_vgpu_type 
> *intel_gvt_find_vgpu_type(struct intel_gvt *gvt,
>   return NULL;
>  }
>  
> -static ssize_t available_instance_show(struct kobject *kobj, struct device 
> *dev,
> - char *buf)
> +static ssize_t available_instances_show(struct kobject *kobj,
> + struct device *dev, char *buf)
>  {
>   struct intel_vgpu_type *type;
>   unsigned int num = 0;
> @@ -269,12 +269,12 @@ static ssize_t description_show(struct kobject *kobj, 
> struct device *dev,
>   type->fence);
>  }
>  
> -static MDEV_TYPE_ATTR_RO(available_instance);
> +static MDEV_TYPE_ATTR_RO(available_instances);
>  static MDEV_TYPE_ATTR_RO(device_api);
>  static MDEV_TYPE_ATTR_RO(description);
>  
>  static struct attribute *type_attrs[] = {
> - _type_attr_available_instance.attr,
> + _type_attr_available_instances.attr,
>   _type_attr_device_api.attr,
>   _type_attr_description.attr,
>   NULL,
>

[v2 1/2] capability: export has_capability

2017-01-12 Thread Jike Song

has_capability() is sometimes needed by modules to test capability
for specified task other than current, so export it.

Cc: Alex Williamson <alex.william...@redhat.com>
Cc: Kirti Wankhede <kwankh...@nvidia.com>
Acked-by: Serge Hallyn <se...@hallyn.com>
Signed-off-by: Jike Song <jike.s...@intel.com>
---
 kernel/capability.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/capability.c b/kernel/capability.c
index a98e814..f97fe77 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -318,6 +318,7 @@ bool has_capability(struct task_struct *t, int cap)
 {
return has_ns_capability(t, _user_ns, cap);
 }
+EXPORT_SYMBOL(has_capability);
 
 /**
  * has_ns_capability_noaudit - Does a task have a capability (unaudited)
-- 
1.9.3

[v2 1/2] capability: export has_capability

2017-01-12 Thread Jike Song

has_capability() is sometimes needed by modules to test capability
for specified task other than current, so export it.

Cc: Alex Williamson 
Cc: Kirti Wankhede 
Acked-by: Serge Hallyn 
Signed-off-by: Jike Song 
---
 kernel/capability.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/capability.c b/kernel/capability.c
index a98e814..f97fe77 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -318,6 +318,7 @@ bool has_capability(struct task_struct *t, int cap)
 {
return has_ns_capability(t, _user_ns, cap);
 }
+EXPORT_SYMBOL(has_capability);
 
 /**
  * has_ns_capability_noaudit - Does a task have a capability (unaudited)
-- 
1.9.3

[v2 2/2] vfio iommu type1: fix the testing of capability for remote task

2017-01-12 Thread Jike Song

Before the mdev enhancement type1 iommu used capable() to test the
capability of current task; in the course of mdev development a
new requirement, testing for another task other than current, was
raised.  ns_capable() was used for this purpose, however it still
tests current, the only difference is, in a specified namespace.

Fix it by using has_capability() instead, which tests the cap for
specified task in init_user_ns, the same namespace as capable().

Cc: Alex Williamson <alex.william...@redhat.com>
Cc: Kirti Wankhede <kwankh...@nvidia.com>
Cc: Gerd Hoffmann <kra...@redhat.com>
Signed-off-by: Jike Song <jike.s...@intel.com>
---
 drivers/vfio/vfio_iommu_type1.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 9266271..77373e5 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -495,8 +495,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma, 
unsigned long vaddr,
  unsigned long *pfn_base, bool do_accounting)
 {
unsigned long limit;
-   bool lock_cap = ns_capable(task_active_pid_ns(dma->task)->user_ns,
-  CAP_IPC_LOCK);
+   bool lock_cap = has_capability(dma->task, CAP_IPC_LOCK);
struct mm_struct *mm;
int ret;
bool rsvd;
-- 
1.9.3

[v2 2/2] vfio iommu type1: fix the testing of capability for remote task

2017-01-12 Thread Jike Song

Before the mdev enhancement type1 iommu used capable() to test the
capability of current task; in the course of mdev development a
new requirement, testing for another task other than current, was
raised.  ns_capable() was used for this purpose, however it still
tests current, the only difference is, in a specified namespace.

Fix it by using has_capability() instead, which tests the cap for
specified task in init_user_ns, the same namespace as capable().

Cc: Alex Williamson 
Cc: Kirti Wankhede 
Cc: Gerd Hoffmann 
Signed-off-by: Jike Song 
---
 drivers/vfio/vfio_iommu_type1.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 9266271..77373e5 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -495,8 +495,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma, 
unsigned long vaddr,
  unsigned long *pfn_base, bool do_accounting)
 {
unsigned long limit;
-   bool lock_cap = ns_capable(task_active_pid_ns(dma->task)->user_ns,
-  CAP_IPC_LOCK);
+   bool lock_cap = has_capability(dma->task, CAP_IPC_LOCK);
struct mm_struct *mm;
int ret;
bool rsvd;
-- 
1.9.3

[v2 0/2] test capability for remote task

2017-01-12 Thread Jike Song

Sometimes vfio iommu type1 needs to pin memory for a remote task other
than current, thereby needs to test the CAP_IPC_LOCK capability for
that task.

The proper routine for this purpose is has_capability(), but it is
not yet exported for modules. None of currently exported capability-
testing symbols allows a specified task. So here in this series
has_capability() is exported then used in the vfio iommu type1 driver.



v2: -> Add Serge's Acked-by to PATCH [1/2]
-> Remove the change in vfio_pin_pages_remote, since it's now guaranteed 
the 'current' process


Hi Alex,

I kept EXPORT_SYMBOL other than EXPORT_SYMBOL_GPL, since I'm still
worry about changing the type of existing exports in 'capability.c'.
I'm new to open-source fearing of violating GPL :)


Jike Song (2):
  capability: export has_capability
  vfio iommu type1: fix the testing of capability for remote task

 drivers/vfio/vfio_iommu_type1.c | 3 +--
 kernel/capability.c | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

-- 
1.9.3

[v2 0/2] test capability for remote task

2017-01-12 Thread Jike Song

Sometimes vfio iommu type1 needs to pin memory for a remote task other
than current, thereby needs to test the CAP_IPC_LOCK capability for
that task.

The proper routine for this purpose is has_capability(), but it is
not yet exported for modules. None of currently exported capability-
testing symbols allows a specified task. So here in this series
has_capability() is exported then used in the vfio iommu type1 driver.



v2: -> Add Serge's Acked-by to PATCH [1/2]
-> Remove the change in vfio_pin_pages_remote, since it's now guaranteed 
the 'current' process


Hi Alex,

I kept EXPORT_SYMBOL other than EXPORT_SYMBOL_GPL, since I'm still
worry about changing the type of existing exports in 'capability.c'.
I'm new to open-source fearing of violating GPL :)


Jike Song (2):
  capability: export has_capability
  vfio iommu type1: fix the testing of capability for remote task

 drivers/vfio/vfio_iommu_type1.c | 3 +--
 kernel/capability.c | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

-- 
1.9.3

Re: [PATCH 1/2] capability: export has_capability

2017-01-11 Thread Jike Song

On 01/12/2017 02:47 AM, Alex Williamson wrote:
> On Thu, 22 Dec 2016 00:10:15 +0800
> Jike Song <jike.s...@intel.com> wrote:
> 
>> has_capability() is sometimes needed by modules to test capability
>> for specified task other than current, so export it.
>>
>> Cc: Alex Williamson <alex.william...@redhat.com>
>> Cc: Kirti Wankhede <kwankh...@nvidia.com>
>> Signed-off-by: Jike Song <jike.s...@intel.com>
>> ---
>>  kernel/capability.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/kernel/capability.c b/kernel/capability.c
>> index 4984e1f..e2e198c 100644
>> --- a/kernel/capability.c
>> +++ b/kernel/capability.c
>> @@ -318,6 +318,7 @@ bool has_capability(struct task_struct *t, int cap)
>>  {
>>  return has_ns_capability(t, _user_ns, cap);
>>  }
>> +EXPORT_SYMBOL(has_capability);
>>  
>>  /**
>>   * has_ns_capability_noaudit - Does a task have a capability (unaudited)
> 
> Are we using EXPORT_SYMBOL vs EXPORT_SYMBOL_GPL here to match the other
> exports in this file?  We could use _GPL to match the expected caller
> of this.
> 

Yes, I chose EXPORT_SYMBOL to match the existing exports in capability.c.
Either is good to me, of course :)

> 
> Serge,
> 
> Do you have any comments on this patch?  I'd be happy to pull it
> through the vfio tree with an appropriate Ack.  Thanks,

Guess Serge still on Xmas vocation? :)

--
Thanks,
Jike

Re: [PATCH 1/2] capability: export has_capability

2017-01-11 Thread Jike Song

On 01/12/2017 02:47 AM, Alex Williamson wrote:
> On Thu, 22 Dec 2016 00:10:15 +0800
> Jike Song  wrote:
> 
>> has_capability() is sometimes needed by modules to test capability
>> for specified task other than current, so export it.
>>
>> Cc: Alex Williamson 
>> Cc: Kirti Wankhede 
>> Signed-off-by: Jike Song 
>> ---
>>  kernel/capability.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/kernel/capability.c b/kernel/capability.c
>> index 4984e1f..e2e198c 100644
>> --- a/kernel/capability.c
>> +++ b/kernel/capability.c
>> @@ -318,6 +318,7 @@ bool has_capability(struct task_struct *t, int cap)
>>  {
>>  return has_ns_capability(t, _user_ns, cap);
>>  }
>> +EXPORT_SYMBOL(has_capability);
>>  
>>  /**
>>   * has_ns_capability_noaudit - Does a task have a capability (unaudited)
> 
> Are we using EXPORT_SYMBOL vs EXPORT_SYMBOL_GPL here to match the other
> exports in this file?  We could use _GPL to match the expected caller
> of this.
> 

Yes, I chose EXPORT_SYMBOL to match the existing exports in capability.c.
Either is good to me, of course :)

> 
> Serge,
> 
> Do you have any comments on this patch?  I'd be happy to pull it
> through the vfio tree with an appropriate Ack.  Thanks,

Guess Serge still on Xmas vocation? :)

--
Thanks,
Jike

Re: [PATCH 2/2] vfio iommu type1: fix the testing of capability for remote task

2016-12-22 Thread Jike Song

On 12/22/2016 08:20 PM, Kirti Wankhede wrote:
> On 12/21/2016 9:40 PM, Jike Song wrote:
>> Before the mdev enhancement type1 iommu used capable() to test the
>> capability of current task; in the course of mdev development a
>> new requirement, testing for another task other than current, was
>> raised.  ns_capable() was used for this purpose, however it still
>> tests current, the only difference is, in a specified namespace.
>>
>> Fix it by using has_capability() instead, which tests the cap for
>> specified task in init_user_ns, the same namespace as capable().
>>
>> Cc: Alex Williamson <alex.william...@redhat.com>
>> Cc: Kirti Wankhede <kwankh...@nvidia.com>
>> Cc: Gerd Hoffmann <kra...@redhat.com>
>> Signed-off-by: Jike Song <jike.s...@intel.com>
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 6 ++
>>  1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index f3726ba..b54aedf 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -394,8 +394,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, 
>> unsigned long vaddr,
>>long npage, unsigned long *pfn_base)
>>  {
>>  unsigned long limit;
>> -bool lock_cap = ns_capable(task_active_pid_ns(dma->task)->user_ns,
>> -   CAP_IPC_LOCK);
>> +bool lock_cap = has_capability(dma->task, CAP_IPC_LOCK);
> 
> 
> Hi Jike,
> 
> Alex's patch already changes this to capable(), you need to resolve.
> https://lkml.org/lkml/2016/12/20/490
> 
> You need to do only below change, which looks fine to me.
> 

Thanks for the point, will change it in v2.  However, that will probably be
after patch 1/2 accepted, otherwise we get undefined symbols.

--
Thanks,
Jike

>>  struct mm_struct *mm;
>>  long ret, i = 0, lock_acct = 0;
>>  bool rsvd;
>> @@ -491,8 +490,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma, 
>> unsigned long vaddr,
>>unsigned long *pfn_base, bool do_accounting)
>>  {
>>  unsigned long limit;
>> -bool lock_cap = ns_capable(task_active_pid_ns(dma->task)->user_ns,
>> -   CAP_IPC_LOCK);
>> +bool lock_cap = has_capability(dma->task, CAP_IPC_LOCK);
>>  struct mm_struct *mm;
>>  int ret;
>>  bool rsvd;
>>

Re: [PATCH 2/2] vfio iommu type1: fix the testing of capability for remote task

2016-12-22 Thread Jike Song

On 12/22/2016 08:20 PM, Kirti Wankhede wrote:
> On 12/21/2016 9:40 PM, Jike Song wrote:
>> Before the mdev enhancement type1 iommu used capable() to test the
>> capability of current task; in the course of mdev development a
>> new requirement, testing for another task other than current, was
>> raised.  ns_capable() was used for this purpose, however it still
>> tests current, the only difference is, in a specified namespace.
>>
>> Fix it by using has_capability() instead, which tests the cap for
>> specified task in init_user_ns, the same namespace as capable().
>>
>> Cc: Alex Williamson 
>> Cc: Kirti Wankhede 
>> Cc: Gerd Hoffmann 
>> Signed-off-by: Jike Song 
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 6 ++
>>  1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index f3726ba..b54aedf 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -394,8 +394,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, 
>> unsigned long vaddr,
>>long npage, unsigned long *pfn_base)
>>  {
>>  unsigned long limit;
>> -bool lock_cap = ns_capable(task_active_pid_ns(dma->task)->user_ns,
>> -   CAP_IPC_LOCK);
>> +bool lock_cap = has_capability(dma->task, CAP_IPC_LOCK);
> 
> 
> Hi Jike,
> 
> Alex's patch already changes this to capable(), you need to resolve.
> https://lkml.org/lkml/2016/12/20/490
> 
> You need to do only below change, which looks fine to me.
> 

Thanks for the point, will change it in v2.  However, that will probably be
after patch 1/2 accepted, otherwise we get undefined symbols.

--
Thanks,
Jike

>>  struct mm_struct *mm;
>>  long ret, i = 0, lock_acct = 0;
>>  bool rsvd;
>> @@ -491,8 +490,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma, 
>> unsigned long vaddr,
>>unsigned long *pfn_base, bool do_accounting)
>>  {
>>  unsigned long limit;
>> -bool lock_cap = ns_capable(task_active_pid_ns(dma->task)->user_ns,
>> -   CAP_IPC_LOCK);
>> +bool lock_cap = has_capability(dma->task, CAP_IPC_LOCK);
>>  struct mm_struct *mm;
>>  int ret;
>>  bool rsvd;
>>

Re: [PATCH 2/4] vfio-mdev: de-polute the namespace, rename parent_device & parent_ops

2016-12-21 Thread Jike Song

Not sure if this is appropriate, but if not having the Documentation considered,
for patch 2-4:

Reviewed-by: Jike Song <jike.s...@intel.com>

--
Thanks,
Jike

On 12/22/2016 07:27 AM, Alex Williamson wrote:
> From: Alex Williamson <alwil...@nuc.home>
> 
> Add an mdev_ prefix so we're not poluting the namespace so much.
> 
> Cc: Kirti Wankhede <kwankh...@nvidia.com>
> Cc: Zhenyu Wang <zhen...@linux.intel.com>
> Cc: Zhi Wang <zhi.a.w...@intel.com>
> Cc: Jike Song <jike.s...@intel.com>
> Signed-off-by: Alex Williamson <alex.william...@redhat.com>
> ---
>  drivers/gpu/drm/i915/gvt/kvmgt.c |2 +-
>  drivers/vfio/mdev/mdev_core.c|   28 ++--
>  drivers/vfio/mdev/mdev_private.h |6 +++---
>  drivers/vfio/mdev/mdev_sysfs.c   |8 
>  drivers/vfio/mdev/vfio_mdev.c|   12 ++--
>  include/linux/mdev.h |   16 
>  samples/vfio-mdev/mtty.c |2 +-
>  7 files changed, 37 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 4dd6722..081ada2 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -1089,7 +1089,7 @@ static long intel_vgpu_ioctl(struct mdev_device *mdev, 
> unsigned int cmd,
>   return 0;
>  }
>  
> -static const struct parent_ops intel_vgpu_ops = {
> +static const struct mdev_parent_ops intel_vgpu_ops = {
>   .supported_type_groups  = intel_vgpu_type_groups,
>   .create = intel_vgpu_create,
>   .remove = intel_vgpu_remove,
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index be1ee89..4a140e0 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -42,7 +42,7 @@ static int _find_mdev_device(struct device *dev, void *data)
>   return 0;
>  }
>  
> -static bool mdev_device_exist(struct parent_device *parent, uuid_le uuid)
> +static bool mdev_device_exist(struct mdev_parent *parent, uuid_le uuid)
>  {
>   struct device *dev;
>  
> @@ -56,9 +56,9 @@ static bool mdev_device_exist(struct parent_device *parent, 
> uuid_le uuid)
>  }
>  
>  /* Should be called holding parent_list_lock */
> -static struct parent_device *__find_parent_device(struct device *dev)
> +static struct mdev_parent *__find_parent_device(struct device *dev)
>  {
> - struct parent_device *parent;
> + struct mdev_parent *parent;
>  
>   list_for_each_entry(parent, _list, next) {
>   if (parent->dev == dev)
> @@ -69,8 +69,8 @@ static struct parent_device *__find_parent_device(struct 
> device *dev)
>  
>  static void mdev_release_parent(struct kref *kref)
>  {
> - struct parent_device *parent = container_of(kref, struct parent_device,
> - ref);
> + struct mdev_parent *parent = container_of(kref, struct mdev_parent,
> +   ref);
>   struct device *dev = parent->dev;
>  
>   kfree(parent);
> @@ -78,7 +78,7 @@ static void mdev_release_parent(struct kref *kref)
>  }
>  
>  static
> -inline struct parent_device *mdev_get_parent(struct parent_device *parent)
> +inline struct mdev_parent *mdev_get_parent(struct mdev_parent *parent)
>  {
>   if (parent)
>   kref_get(>ref);
> @@ -86,7 +86,7 @@ inline struct parent_device *mdev_get_parent(struct 
> parent_device *parent)
>   return parent;
>  }
>  
> -static inline void mdev_put_parent(struct parent_device *parent)
> +static inline void mdev_put_parent(struct mdev_parent *parent)
>  {
>   if (parent)
>   kref_put(>ref, mdev_release_parent);
> @@ -95,7 +95,7 @@ static inline void mdev_put_parent(struct parent_device 
> *parent)
>  static int mdev_device_create_ops(struct kobject *kobj,
> struct mdev_device *mdev)
>  {
> - struct parent_device *parent = mdev->parent;
> + struct mdev_parent *parent = mdev->parent;
>   int ret;
>  
>   ret = parent->ops->create(kobj, mdev);
> @@ -122,7 +122,7 @@ static int mdev_device_create_ops(struct kobject *kobj,
>   */
>  static int mdev_device_remove_ops(struct mdev_device *mdev, bool 
> force_remove)
>  {
> - struct parent_device *parent = mdev->parent;
> + struct mdev_parent *parent = mdev->parent;
>   int ret;
>  
>   /*
> @@ -153,10 +153,10 @@ static int mdev_device_remove_cb(struct device *dev, 
> void *data)
>   * Add device to list of registered parent devices.
>   * Returns a negative value on error, othe

Re: [PATCH 2/4] vfio-mdev: de-polute the namespace, rename parent_device & parent_ops

2016-12-21 Thread Jike Song

Not sure if this is appropriate, but if not having the Documentation considered,
for patch 2-4:

Reviewed-by: Jike Song 

--
Thanks,
Jike

On 12/22/2016 07:27 AM, Alex Williamson wrote:
> From: Alex Williamson 
> 
> Add an mdev_ prefix so we're not poluting the namespace so much.
> 
> Cc: Kirti Wankhede 
> Cc: Zhenyu Wang 
> Cc: Zhi Wang 
> Cc: Jike Song 
> Signed-off-by: Alex Williamson 
> ---
>  drivers/gpu/drm/i915/gvt/kvmgt.c |2 +-
>  drivers/vfio/mdev/mdev_core.c|   28 ++--
>  drivers/vfio/mdev/mdev_private.h |6 +++---
>  drivers/vfio/mdev/mdev_sysfs.c   |8 
>  drivers/vfio/mdev/vfio_mdev.c|   12 ++--
>  include/linux/mdev.h |   16 
>  samples/vfio-mdev/mtty.c |2 +-
>  7 files changed, 37 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
> b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 4dd6722..081ada2 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -1089,7 +1089,7 @@ static long intel_vgpu_ioctl(struct mdev_device *mdev, 
> unsigned int cmd,
>   return 0;
>  }
>  
> -static const struct parent_ops intel_vgpu_ops = {
> +static const struct mdev_parent_ops intel_vgpu_ops = {
>   .supported_type_groups  = intel_vgpu_type_groups,
>   .create = intel_vgpu_create,
>   .remove = intel_vgpu_remove,
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index be1ee89..4a140e0 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -42,7 +42,7 @@ static int _find_mdev_device(struct device *dev, void *data)
>   return 0;
>  }
>  
> -static bool mdev_device_exist(struct parent_device *parent, uuid_le uuid)
> +static bool mdev_device_exist(struct mdev_parent *parent, uuid_le uuid)
>  {
>   struct device *dev;
>  
> @@ -56,9 +56,9 @@ static bool mdev_device_exist(struct parent_device *parent, 
> uuid_le uuid)
>  }
>  
>  /* Should be called holding parent_list_lock */
> -static struct parent_device *__find_parent_device(struct device *dev)
> +static struct mdev_parent *__find_parent_device(struct device *dev)
>  {
> - struct parent_device *parent;
> + struct mdev_parent *parent;
>  
>   list_for_each_entry(parent, _list, next) {
>   if (parent->dev == dev)
> @@ -69,8 +69,8 @@ static struct parent_device *__find_parent_device(struct 
> device *dev)
>  
>  static void mdev_release_parent(struct kref *kref)
>  {
> - struct parent_device *parent = container_of(kref, struct parent_device,
> - ref);
> + struct mdev_parent *parent = container_of(kref, struct mdev_parent,
> +   ref);
>   struct device *dev = parent->dev;
>  
>   kfree(parent);
> @@ -78,7 +78,7 @@ static void mdev_release_parent(struct kref *kref)
>  }
>  
>  static
> -inline struct parent_device *mdev_get_parent(struct parent_device *parent)
> +inline struct mdev_parent *mdev_get_parent(struct mdev_parent *parent)
>  {
>   if (parent)
>   kref_get(>ref);
> @@ -86,7 +86,7 @@ inline struct parent_device *mdev_get_parent(struct 
> parent_device *parent)
>   return parent;
>  }
>  
> -static inline void mdev_put_parent(struct parent_device *parent)
> +static inline void mdev_put_parent(struct mdev_parent *parent)
>  {
>   if (parent)
>   kref_put(>ref, mdev_release_parent);
> @@ -95,7 +95,7 @@ static inline void mdev_put_parent(struct parent_device 
> *parent)
>  static int mdev_device_create_ops(struct kobject *kobj,
> struct mdev_device *mdev)
>  {
> - struct parent_device *parent = mdev->parent;
> + struct mdev_parent *parent = mdev->parent;
>   int ret;
>  
>   ret = parent->ops->create(kobj, mdev);
> @@ -122,7 +122,7 @@ static int mdev_device_create_ops(struct kobject *kobj,
>   */
>  static int mdev_device_remove_ops(struct mdev_device *mdev, bool 
> force_remove)
>  {
> - struct parent_device *parent = mdev->parent;
> + struct mdev_parent *parent = mdev->parent;
>   int ret;
>  
>   /*
> @@ -153,10 +153,10 @@ static int mdev_device_remove_cb(struct device *dev, 
> void *data)
>   * Add device to list of registered parent devices.
>   * Returns a negative value on error, otherwise 0.
>   */
> -int mdev_register_device(struct device *dev, const struct parent_ops *ops)
> +int mdev_register_device(struct device *dev, const struct mdev_parent_ops 
> *ops)
>

Re: [PATCH 0/4] vfio-mdev: Clean namespace and better define ABI

2016-12-21 Thread Jike Song

On 12/22/2016 07:27 AM, Alex Williamson wrote:
> Cleanup the namespace a bit by prefixing structures with mdev_ and
> also more concretely define the mdev interface.  Structs with comments
> defining which fields are private vs public tempts poor behavior,
> especially for an interface where we expect out of tree vendor drivers.

Personally I like this series :)

Side notes: 1) There is also Documentation to be updated; 2) your mail
address in Author field is @nuc.home?

--
Thanks,
Jike

> 
> ---
> 
> Alex Williamson (4):
>   vfio-mdev: Remove an unused structure element
>   vfio-mdev: de-polute the namespace, rename parent_device & parent_ops
>   vfio-mdev: Make mdev_parent private
>   vfio-mdev: Make mdev_device private and abstract interfaces
> 
> 
>  drivers/gpu/drm/i915/gvt/kvmgt.c |   22 +++--
>  drivers/vfio/mdev/mdev_core.c|   64 
> ++
>  drivers/vfio/mdev/mdev_private.h |   28 +++--
>  drivers/vfio/mdev/mdev_sysfs.c   |8 ++---
>  drivers/vfio/mdev/vfio_mdev.c|   12 ---
>  include/linux/mdev.h |   54 +++-
>  samples/vfio-mdev/mtty.c |   28 +
>  7 files changed, 123 insertions(+), 93 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>

Re: [PATCH 0/4] vfio-mdev: Clean namespace and better define ABI

2016-12-21 Thread Jike Song

On 12/22/2016 07:27 AM, Alex Williamson wrote:
> Cleanup the namespace a bit by prefixing structures with mdev_ and
> also more concretely define the mdev interface.  Structs with comments
> defining which fields are private vs public tempts poor behavior,
> especially for an interface where we expect out of tree vendor drivers.

Personally I like this series :)

Side notes: 1) There is also Documentation to be updated; 2) your mail
address in Author field is @nuc.home?

--
Thanks,
Jike

> 
> ---
> 
> Alex Williamson (4):
>   vfio-mdev: Remove an unused structure element
>   vfio-mdev: de-polute the namespace, rename parent_device & parent_ops
>   vfio-mdev: Make mdev_parent private
>   vfio-mdev: Make mdev_device private and abstract interfaces
> 
> 
>  drivers/gpu/drm/i915/gvt/kvmgt.c |   22 +++--
>  drivers/vfio/mdev/mdev_core.c|   64 
> ++
>  drivers/vfio/mdev/mdev_private.h |   28 +++--
>  drivers/vfio/mdev/mdev_sysfs.c   |8 ++---
>  drivers/vfio/mdev/vfio_mdev.c|   12 ---
>  include/linux/mdev.h |   54 +++-
>  samples/vfio-mdev/mtty.c |   28 +
>  7 files changed, 123 insertions(+), 93 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>

[PATCH 1/2] capability: export has_capability

2016-12-21 Thread Jike Song

has_capability() is sometimes needed by modules to test capability
for specified task other than current, so export it.

Cc: Alex Williamson <alex.william...@redhat.com>
Cc: Kirti Wankhede <kwankh...@nvidia.com>
Signed-off-by: Jike Song <jike.s...@intel.com>
---
 kernel/capability.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/capability.c b/kernel/capability.c
index 4984e1f..e2e198c 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -318,6 +318,7 @@ bool has_capability(struct task_struct *t, int cap)
 {
return has_ns_capability(t, _user_ns, cap);
 }
+EXPORT_SYMBOL(has_capability);
 
 /**
  * has_ns_capability_noaudit - Does a task have a capability (unaudited)
-- 
2.4.4.488.gdf97e5d

[PATCH 2/2] vfio iommu type1: fix the testing of capability for remote task

2016-12-21 Thread Jike Song

Before the mdev enhancement type1 iommu used capable() to test the
capability of current task; in the course of mdev development a
new requirement, testing for another task other than current, was
raised.  ns_capable() was used for this purpose, however it still
tests current, the only difference is, in a specified namespace.

Fix it by using has_capability() instead, which tests the cap for
specified task in init_user_ns, the same namespace as capable().

Cc: Alex Williamson <alex.william...@redhat.com>
Cc: Kirti Wankhede <kwankh...@nvidia.com>
Cc: Gerd Hoffmann <kra...@redhat.com>
Signed-off-by: Jike Song <jike.s...@intel.com>
---
 drivers/vfio/vfio_iommu_type1.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index f3726ba..b54aedf 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -394,8 +394,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, 
unsigned long vaddr,
  long npage, unsigned long *pfn_base)
 {
unsigned long limit;
-   bool lock_cap = ns_capable(task_active_pid_ns(dma->task)->user_ns,
-  CAP_IPC_LOCK);
+   bool lock_cap = has_capability(dma->task, CAP_IPC_LOCK);
struct mm_struct *mm;
long ret, i = 0, lock_acct = 0;
bool rsvd;
@@ -491,8 +490,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma, 
unsigned long vaddr,
  unsigned long *pfn_base, bool do_accounting)
 {
unsigned long limit;
-   bool lock_cap = ns_capable(task_active_pid_ns(dma->task)->user_ns,
-  CAP_IPC_LOCK);
+   bool lock_cap = has_capability(dma->task, CAP_IPC_LOCK);
struct mm_struct *mm;
int ret;
bool rsvd;
-- 
2.4.4.488.gdf97e5d

[PATCH 1/2] capability: export has_capability

2016-12-21 Thread Jike Song

has_capability() is sometimes needed by modules to test capability
for specified task other than current, so export it.

Cc: Alex Williamson 
Cc: Kirti Wankhede 
Signed-off-by: Jike Song 
---
 kernel/capability.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/capability.c b/kernel/capability.c
index 4984e1f..e2e198c 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -318,6 +318,7 @@ bool has_capability(struct task_struct *t, int cap)
 {
return has_ns_capability(t, _user_ns, cap);
 }
+EXPORT_SYMBOL(has_capability);
 
 /**
  * has_ns_capability_noaudit - Does a task have a capability (unaudited)
-- 
2.4.4.488.gdf97e5d

[PATCH 2/2] vfio iommu type1: fix the testing of capability for remote task

2016-12-21 Thread Jike Song

Before the mdev enhancement type1 iommu used capable() to test the
capability of current task; in the course of mdev development a
new requirement, testing for another task other than current, was
raised.  ns_capable() was used for this purpose, however it still
tests current, the only difference is, in a specified namespace.

Fix it by using has_capability() instead, which tests the cap for
specified task in init_user_ns, the same namespace as capable().

Cc: Alex Williamson 
Cc: Kirti Wankhede 
Cc: Gerd Hoffmann 
Signed-off-by: Jike Song 
---
 drivers/vfio/vfio_iommu_type1.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index f3726ba..b54aedf 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -394,8 +394,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, 
unsigned long vaddr,
  long npage, unsigned long *pfn_base)
 {
unsigned long limit;
-   bool lock_cap = ns_capable(task_active_pid_ns(dma->task)->user_ns,
-  CAP_IPC_LOCK);
+   bool lock_cap = has_capability(dma->task, CAP_IPC_LOCK);
struct mm_struct *mm;
long ret, i = 0, lock_acct = 0;
bool rsvd;
@@ -491,8 +490,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma, 
unsigned long vaddr,
  unsigned long *pfn_base, bool do_accounting)
 {
unsigned long limit;
-   bool lock_cap = ns_capable(task_active_pid_ns(dma->task)->user_ns,
-  CAP_IPC_LOCK);
+   bool lock_cap = has_capability(dma->task, CAP_IPC_LOCK);
struct mm_struct *mm;
int ret;
bool rsvd;
-- 
2.4.4.488.gdf97e5d

[PATCH 0/2] test capability for remote task

2016-12-21 Thread Jike Song

Sometimes vfio iommu type1 needs to pin memory for a remote task other
than current, thereby needs to test the CAP_IPC_LOCK capability for
that task.

The proper routine for this purpose is has_capability(), but it is
not yet exported for modules. None of currently exported capability-
testing symbols allows a specified task. So here has_capability()
is exported then used in the vfio iommu type1 driver.


Jike Song (2):
  capability: export has_capability
  vfio iommu type1: fix the testing of capability for remote task

 drivers/vfio/vfio_iommu_type1.c | 6 ++
 kernel/capability.c | 1 +
 2 files changed, 3 insertions(+), 4 deletions(-)

-- 
2.4.4.488.gdf97e5d

[PATCH 0/2] test capability for remote task

2016-12-21 Thread Jike Song

Sometimes vfio iommu type1 needs to pin memory for a remote task other
than current, thereby needs to test the CAP_IPC_LOCK capability for
that task.

The proper routine for this purpose is has_capability(), but it is
not yet exported for modules. None of currently exported capability-
testing symbols allows a specified task. So here has_capability()
is exported then used in the vfio iommu type1 driver.


Jike Song (2):
  capability: export has_capability
  vfio iommu type1: fix the testing of capability for remote task

 drivers/vfio/vfio_iommu_type1.c | 6 ++
 kernel/capability.c | 1 +
 2 files changed, 3 insertions(+), 4 deletions(-)

-- 
2.4.4.488.gdf97e5d

Re: [PATCH v9 00/12] Add Mediated device support

2016-12-05 Thread Jike Song

On 12/06/2016 01:44 AM, Gerd Hoffmann wrote:
>   Hi,
> 
>> Just want to share that we have published a KVMGT implementation
>> based on this v9 patchset, to:
>>
>>  https://github.com/01org/gvt-linux/tree/gvt-next-kvmgt
>>
>> It doesn't utilize common routines introduced by 05+ patches yet.
>> The complete intel vGPU device-model is contained.
> 
> Tried to use this implementation.  Used the
> topic/gvt-next-kvmgt-mdev-2016-11-18 branch which looked like the most
> recent one.  Setup:
> 

Hi Gerd,

We didn't catch up with updating the newest kvmgt code accordingly,
partly because we are preparing the 'final' version to be upstreamed.

Will update a topic/gvt-next-kvmgt-2016-12-06 today, sorry for the
inconvenience :)

>   * Everything compiled as modules.
>   * iommu turned off for the igd (intel_iommu=on,igfx_off).
>   * Blacklisted i915 so dracut initrd doesn't load it
> (rd.driver.blacklist=i915)
>   * tweaked module config so kvmgt is loaded before i915,
> also enable gvt:
> 
>   # cat /etc/modprobe.d/kraxel-gvt.conf 
>   options i915 enable_gvt=1
>   softdep i915 pre: kvmgt
> 
> Everything seems to load fine.  Sysfs files are there, and I can create
> vgpus.
> 

Yes, everything looks good so far.

> Trying to assign a vgpu this way:
> 
>   -device vfio-pci,sysfsdev=/sys/class/mdev_bus/:00:02.0/
> 
> fails though and gives this message in the kernel log:
> 
>   [  402.560350] [drm:intel_vgpu_open [kvmgt]] *ERROR* gvt: KVM is
> required to use Intel vGPU
> 
> Trying the same with a mtty sample device works and I can see the pci
> serial device in the guest.
> 
> Any clues what is going wrong?

The getting kvm instance code is missing in that branch, will be
contained in the new one.

> Has this version any support for exporting the guest display as dma-buf,
> so qemu can show it?  Or is this a headless vgpu?

No, this version doesn't have dma-buf support yet, we were using x11vnc
in guest to test it internally. I'll include you in the igvt-g-dev
mailing list for further discussion :)

--
Thanks,
Jike

Re: [PATCH v9 00/12] Add Mediated device support

2016-12-05 Thread Jike Song

On 12/06/2016 01:44 AM, Gerd Hoffmann wrote:
>   Hi,
> 
>> Just want to share that we have published a KVMGT implementation
>> based on this v9 patchset, to:
>>
>>  https://github.com/01org/gvt-linux/tree/gvt-next-kvmgt
>>
>> It doesn't utilize common routines introduced by 05+ patches yet.
>> The complete intel vGPU device-model is contained.
> 
> Tried to use this implementation.  Used the
> topic/gvt-next-kvmgt-mdev-2016-11-18 branch which looked like the most
> recent one.  Setup:
> 

Hi Gerd,

We didn't catch up with updating the newest kvmgt code accordingly,
partly because we are preparing the 'final' version to be upstreamed.

Will update a topic/gvt-next-kvmgt-2016-12-06 today, sorry for the
inconvenience :)

>   * Everything compiled as modules.
>   * iommu turned off for the igd (intel_iommu=on,igfx_off).
>   * Blacklisted i915 so dracut initrd doesn't load it
> (rd.driver.blacklist=i915)
>   * tweaked module config so kvmgt is loaded before i915,
> also enable gvt:
> 
>   # cat /etc/modprobe.d/kraxel-gvt.conf 
>   options i915 enable_gvt=1
>   softdep i915 pre: kvmgt
> 
> Everything seems to load fine.  Sysfs files are there, and I can create
> vgpus.
> 

Yes, everything looks good so far.

> Trying to assign a vgpu this way:
> 
>   -device vfio-pci,sysfsdev=/sys/class/mdev_bus/:00:02.0/
> 
> fails though and gives this message in the kernel log:
> 
>   [  402.560350] [drm:intel_vgpu_open [kvmgt]] *ERROR* gvt: KVM is
> required to use Intel vGPU
> 
> Trying the same with a mtty sample device works and I can see the pci
> serial device in the guest.
> 
> Any clues what is going wrong?

The getting kvm instance code is missing in that branch, will be
contained in the new one.

> Has this version any support for exporting the guest display as dma-buf,
> so qemu can show it?  Or is this a headless vgpu?

No, this version doesn't have dma-buf support yet, we were using x11vnc
in guest to test it internally. I'll include you in the igvt-g-dev
mailing list for further discussion :)

--
Thanks,
Jike

Re: [Qemu-devel] [PATCH v14 00/22] Add Mediated device support

2016-11-17 Thread Jike Song

On 11/18/2016 10:00 AM, Kirti Wankhede wrote:
> On 11/18/2016 3:35 AM, Neo Jia wrote:
>> On Thu, Nov 17, 2016 at 02:25:15PM -0700, Alex Williamson wrote:
>>> On Thu, 17 Nov 2016 02:16:12 +0530
>>> Kirti Wankhede  wrote:

  Documentation/ABI/testing/sysfs-bus-vfio-mdev |  111 ++
  Documentation/vfio-mediated-device.txt|  399 +++
  MAINTAINERS   |9 +
  drivers/vfio/Kconfig  |1 +
  drivers/vfio/Makefile |1 +
  drivers/vfio/mdev/Kconfig |   17 +
  drivers/vfio/mdev/Makefile|5 +
  drivers/vfio/mdev/mdev_core.c |  385 +++
  drivers/vfio/mdev/mdev_driver.c   |  119 ++
  drivers/vfio/mdev/mdev_private.h  |   41 +
  drivers/vfio/mdev/mdev_sysfs.c|  286 +
  drivers/vfio/mdev/vfio_mdev.c |  180 +++
  drivers/vfio/pci/vfio_pci.c   |   83 +-
  drivers/vfio/platform/vfio_platform_common.c  |   31 +-
  drivers/vfio/vfio.c   |  340 +-
  drivers/vfio/vfio_iommu_type1.c   |  872 +++---
  include/linux/mdev.h  |  177 +++
  include/linux/vfio.h  |   32 +-
  include/uapi/linux/vfio.h |   10 +
  samples/vfio-mdev/Makefile|   13 +
  samples/vfio-mdev/mtty.c  | 1503 
 +
  21 files changed, 4358 insertions(+), 257 deletions(-)
  create mode 100644 Documentation/ABI/testing/sysfs-bus-vfio-mdev
  create mode 100644 Documentation/vfio-mediated-device.txt
  create mode 100644 drivers/vfio/mdev/Kconfig
  create mode 100644 drivers/vfio/mdev/Makefile
  create mode 100644 drivers/vfio/mdev/mdev_core.c
  create mode 100644 drivers/vfio/mdev/mdev_driver.c
  create mode 100644 drivers/vfio/mdev/mdev_private.h
  create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
  create mode 100644 drivers/vfio/mdev/vfio_mdev.c
  create mode 100644 include/linux/mdev.h
  create mode 100644 samples/vfio-mdev/Makefile
  create mode 100644 samples/vfio-mdev/mtty.c
>>>
>>> As discussed, I dropped patch 12, updated the documentation, and added
>>> 'retries' initialization.  This is now applied to my next branch for
>>> v4.10.  Thanks to the reviewers and Kirti and Neo for your hard work!
>>
>> Really appreciate your help and reviews to allow us reach here, and thanks to
>> various reviewers for their comments and suggestions!
>>
> 
> Thanks for your constant guidance and reviews.
> Thanks to all reviewers for reviews and suggestions.

Echo Alex: thanks for your great work, congrats! :-)

--
Thanks,
Jike

Re: [Qemu-devel] [PATCH v14 00/22] Add Mediated device support

2016-11-17 Thread Jike Song

On 11/18/2016 10:00 AM, Kirti Wankhede wrote:
> On 11/18/2016 3:35 AM, Neo Jia wrote:
>> On Thu, Nov 17, 2016 at 02:25:15PM -0700, Alex Williamson wrote:
>>> On Thu, 17 Nov 2016 02:16:12 +0530
>>> Kirti Wankhede  wrote:

  Documentation/ABI/testing/sysfs-bus-vfio-mdev |  111 ++
  Documentation/vfio-mediated-device.txt|  399 +++
  MAINTAINERS   |9 +
  drivers/vfio/Kconfig  |1 +
  drivers/vfio/Makefile |1 +
  drivers/vfio/mdev/Kconfig |   17 +
  drivers/vfio/mdev/Makefile|5 +
  drivers/vfio/mdev/mdev_core.c |  385 +++
  drivers/vfio/mdev/mdev_driver.c   |  119 ++
  drivers/vfio/mdev/mdev_private.h  |   41 +
  drivers/vfio/mdev/mdev_sysfs.c|  286 +
  drivers/vfio/mdev/vfio_mdev.c |  180 +++
  drivers/vfio/pci/vfio_pci.c   |   83 +-
  drivers/vfio/platform/vfio_platform_common.c  |   31 +-
  drivers/vfio/vfio.c   |  340 +-
  drivers/vfio/vfio_iommu_type1.c   |  872 +++---
  include/linux/mdev.h  |  177 +++
  include/linux/vfio.h  |   32 +-
  include/uapi/linux/vfio.h |   10 +
  samples/vfio-mdev/Makefile|   13 +
  samples/vfio-mdev/mtty.c  | 1503 
 +
  21 files changed, 4358 insertions(+), 257 deletions(-)
  create mode 100644 Documentation/ABI/testing/sysfs-bus-vfio-mdev
  create mode 100644 Documentation/vfio-mediated-device.txt
  create mode 100644 drivers/vfio/mdev/Kconfig
  create mode 100644 drivers/vfio/mdev/Makefile
  create mode 100644 drivers/vfio/mdev/mdev_core.c
  create mode 100644 drivers/vfio/mdev/mdev_driver.c
  create mode 100644 drivers/vfio/mdev/mdev_private.h
  create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
  create mode 100644 drivers/vfio/mdev/vfio_mdev.c
  create mode 100644 include/linux/mdev.h
  create mode 100644 samples/vfio-mdev/Makefile
  create mode 100644 samples/vfio-mdev/mtty.c
>>>
>>> As discussed, I dropped patch 12, updated the documentation, and added
>>> 'retries' initialization.  This is now applied to my next branch for
>>> v4.10.  Thanks to the reviewers and Kirti and Neo for your hard work!
>>
>> Really appreciate your help and reviews to allow us reach here, and thanks to
>> various reviewers for their comments and suggestions!
>>
> 
> Thanks for your constant guidance and reviews.
> Thanks to all reviewers for reviews and suggestions.

Echo Alex: thanks for your great work, congrats! :-)

--
Thanks,
Jike

Re: [PATCH v14 12/22] vfio: Add notifier callback to parent's ops structure of mdev

2016-11-17 Thread Jike Song

On 11/17/2016 04:46 AM, Kirti Wankhede wrote:
> Add a notifier calback to parent's ops structure of mdev device so that per
> device notifer for vfio module is registered through vfio_mdev module.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: Iafa6f1721aecdd6e50eb93b153b5621e6d29b637
> ---
>  drivers/vfio/mdev/vfio_mdev.c | 34 +-
>  include/linux/mdev.h  |  9 +
>  2 files changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
> index ffc36758cb84..2f8e06e5f95a 100644
> --- a/drivers/vfio/mdev/vfio_mdev.c
> +++ b/drivers/vfio/mdev/vfio_mdev.c
> @@ -24,6 +24,15 @@
>  #define DRIVER_AUTHOR   "NVIDIA Corporation"
>  #define DRIVER_DESC "VFIO based driver for Mediated device"
>  
> +static int vfio_mdev_notifier(struct notifier_block *nb, unsigned long 
> action,
> +   void *data)
> +{
> + struct mdev_device *mdev = container_of(nb, struct mdev_device, nb);
> + struct parent_device *parent = mdev->parent;
> +
> + return parent->ops->notifier(mdev, action, data);
> +}
> +
>  static int vfio_mdev_open(void *device_data)
>  {
>   struct mdev_device *mdev = device_data;
> @@ -36,9 +45,27 @@ static int vfio_mdev_open(void *device_data)
>   if (!try_module_get(THIS_MODULE))
>   return -ENODEV;
>  
> + if (likely(parent->ops->notifier)) {
> + mdev->nb.notifier_call = vfio_mdev_notifier;
> + ret = vfio_register_notifier(>dev, >nb);
> +
> + /*
> +  * This should not fail if backend iommu module doesn't support
> +  * register_notifier.
> +  */
> + if (ret && (ret != -ENOTTY)) {
> + pr_err("Failed to register notifier for mdev\n");
> + module_put(THIS_MODULE);
> + return ret;
> + }
> + }
> +
>   ret = parent->ops->open(mdev);
> - if (ret)
> + if (ret) {
> + if (likely(parent->ops->notifier))
> + vfio_unregister_notifier(>dev, >nb);
>   module_put(THIS_MODULE);
> + }
>  
>   return ret;
>  }
> @@ -51,6 +78,11 @@ static void vfio_mdev_release(void *device_data)
>   if (likely(parent->ops->release))
>   parent->ops->release(mdev);
>  
> + if (likely(parent->ops->notifier)) {
> + if (vfio_unregister_notifier(>dev, >nb))
> + pr_err("Failed to unregister notifier for mdev\n");
> + }
> +
>   module_put(THIS_MODULE);
>  }
>  
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index ec819e9a115a..94c43034c297 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -37,6 +37,7 @@ struct mdev_device {
>   struct kref ref;
>   struct list_headnext;
>   struct kobject  *type_kobj;
> + struct notifier_block   nb;
>  };
>  
>  /**
> @@ -85,6 +86,12 @@ struct mdev_device {
>   * @mmap:mmap callback
>   *   @mdev: mediated device structure
>   *   @vma: vma structure
> + * @notifer: Notifier callback, currently only for
> + *   VFIO_IOMMU_NOTIFY_DMA_UNMAP action notified duing
> + *   DMA_UNMAP call on mapped iova range.
> + *   @mdev: mediated device structure
> + *   @action: Action for which notifier is called
> + *   @data: Data associated with the notifier
>   * Parent device that support mediated device should be registered with mdev
>   * module with parent_ops structure.
>   **/
> @@ -106,6 +113,8 @@ struct parent_ops {
>   ssize_t (*ioctl)(struct mdev_device *mdev, unsigned int cmd,
>unsigned long arg);
>   int (*mmap)(struct mdev_device *mdev, struct vm_area_struct *vma);
> + int (*notifier)(struct mdev_device *mdev, unsigned long action,
> + void *data);
>  };
>  
>  /* interface for exporting mdev supported type attributes */
>

Hi Alex, Kirti,

Since everyone agreed we should let the vendor driver call 
vfio_register_notifier
directly, can you drop this patch from merging? So that I don't need to send a
reverse patch.

Thanks :)

--
Thanks,
Jike

Re: [PATCH v14 12/22] vfio: Add notifier callback to parent's ops structure of mdev

2016-11-17 Thread Jike Song

On 11/17/2016 04:46 AM, Kirti Wankhede wrote:
> Add a notifier calback to parent's ops structure of mdev device so that per
> device notifer for vfio module is registered through vfio_mdev module.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: Iafa6f1721aecdd6e50eb93b153b5621e6d29b637
> ---
>  drivers/vfio/mdev/vfio_mdev.c | 34 +-
>  include/linux/mdev.h  |  9 +
>  2 files changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
> index ffc36758cb84..2f8e06e5f95a 100644
> --- a/drivers/vfio/mdev/vfio_mdev.c
> +++ b/drivers/vfio/mdev/vfio_mdev.c
> @@ -24,6 +24,15 @@
>  #define DRIVER_AUTHOR   "NVIDIA Corporation"
>  #define DRIVER_DESC "VFIO based driver for Mediated device"
>  
> +static int vfio_mdev_notifier(struct notifier_block *nb, unsigned long 
> action,
> +   void *data)
> +{
> + struct mdev_device *mdev = container_of(nb, struct mdev_device, nb);
> + struct parent_device *parent = mdev->parent;
> +
> + return parent->ops->notifier(mdev, action, data);
> +}
> +
>  static int vfio_mdev_open(void *device_data)
>  {
>   struct mdev_device *mdev = device_data;
> @@ -36,9 +45,27 @@ static int vfio_mdev_open(void *device_data)
>   if (!try_module_get(THIS_MODULE))
>   return -ENODEV;
>  
> + if (likely(parent->ops->notifier)) {
> + mdev->nb.notifier_call = vfio_mdev_notifier;
> + ret = vfio_register_notifier(>dev, >nb);
> +
> + /*
> +  * This should not fail if backend iommu module doesn't support
> +  * register_notifier.
> +  */
> + if (ret && (ret != -ENOTTY)) {
> + pr_err("Failed to register notifier for mdev\n");
> + module_put(THIS_MODULE);
> + return ret;
> + }
> + }
> +
>   ret = parent->ops->open(mdev);
> - if (ret)
> + if (ret) {
> + if (likely(parent->ops->notifier))
> + vfio_unregister_notifier(>dev, >nb);
>   module_put(THIS_MODULE);
> + }
>  
>   return ret;
>  }
> @@ -51,6 +78,11 @@ static void vfio_mdev_release(void *device_data)
>   if (likely(parent->ops->release))
>   parent->ops->release(mdev);
>  
> + if (likely(parent->ops->notifier)) {
> + if (vfio_unregister_notifier(>dev, >nb))
> + pr_err("Failed to unregister notifier for mdev\n");
> + }
> +
>   module_put(THIS_MODULE);
>  }
>  
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index ec819e9a115a..94c43034c297 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -37,6 +37,7 @@ struct mdev_device {
>   struct kref ref;
>   struct list_headnext;
>   struct kobject  *type_kobj;
> + struct notifier_block   nb;
>  };
>  
>  /**
> @@ -85,6 +86,12 @@ struct mdev_device {
>   * @mmap:mmap callback
>   *   @mdev: mediated device structure
>   *   @vma: vma structure
> + * @notifer: Notifier callback, currently only for
> + *   VFIO_IOMMU_NOTIFY_DMA_UNMAP action notified duing
> + *   DMA_UNMAP call on mapped iova range.
> + *   @mdev: mediated device structure
> + *   @action: Action for which notifier is called
> + *   @data: Data associated with the notifier
>   * Parent device that support mediated device should be registered with mdev
>   * module with parent_ops structure.
>   **/
> @@ -106,6 +113,8 @@ struct parent_ops {
>   ssize_t (*ioctl)(struct mdev_device *mdev, unsigned int cmd,
>unsigned long arg);
>   int (*mmap)(struct mdev_device *mdev, struct vm_area_struct *vma);
> + int (*notifier)(struct mdev_device *mdev, unsigned long action,
> + void *data);
>  };
>  
>  /* interface for exporting mdev supported type attributes */
>

Hi Alex, Kirti,

Since everyone agreed we should let the vendor driver call 
vfio_register_notifier
directly, can you drop this patch from merging? So that I don't need to send a
reverse patch.

Thanks :)

--
Thanks,
Jike

Re: [PATCH v14 11/22] vfio iommu: Add blocking notifier to notify DMA_UNMAP

2016-11-16 Thread Jike Song

On 11/17/2016 04:46 AM, Kirti Wankhede wrote:
> Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers
> about DMA_UNMAP.
> Exported two APIs vfio_register_notifier() and vfio_unregister_notifier().
> Notifier should be registered, if external user wants to use
> vfio_pin_pages()/vfio_unpin_pages() APIs to pin/unpin pages.
> Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate
> mappings.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: I5910d0024d6be87f3e8d3e0ca0eaeaaa0b17f271
> ---
>  drivers/vfio/vfio.c | 73 ++
>  drivers/vfio/vfio_iommu_type1.c | 77 
> +
>  include/linux/vfio.h| 12 +++
>  3 files changed, 147 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index bd36c16b0ef2..c850ba324be2 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1901,6 +1901,79 @@ err_unpin_pages:
>  }
>  EXPORT_SYMBOL(vfio_unpin_pages);
>  
> +int vfio_register_notifier(struct device *dev, struct notifier_block *nb)
> +{
> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + ssize_t ret;
> +

Any reason being 'ssize_t' here (and unregister)?

--
Thanks,
Jike
> + if (!dev || !nb)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (IS_ERR(group))
> + return PTR_ERR(group);
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto err_register_nb;
> +
> + container = group->container;
> + down_read(>group_lock);
> +
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->register_notifier))
> + ret = driver->ops->register_notifier(container->iommu_data, nb);
> + else
> + ret = -ENOTTY;
> +
> + up_read(>group_lock);
> + vfio_group_try_dissolve_container(group);
> +
> +err_register_nb:
> + vfio_group_put(group);
> + return ret;
> +}
> +EXPORT_SYMBOL(vfio_register_notifier);
> +
> +int vfio_unregister_notifier(struct device *dev, struct notifier_block *nb)
> +{
> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + ssize_t ret;
> +
> + if (!dev || !nb)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (IS_ERR(group))
> + return PTR_ERR(group);
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto err_unregister_nb;
> +
> + container = group->container;
> + down_read(>group_lock);
> +
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->unregister_notifier))
> + ret = driver->ops->unregister_notifier(container->iommu_data,
> +nb);
> + else
> + ret = -ENOTTY;
> +
> + up_read(>group_lock);
> + vfio_group_try_dissolve_container(group);
> +
> +err_unregister_nb:
> + vfio_group_put(group);
> + return ret;
> +}
> +EXPORT_SYMBOL(vfio_unregister_notifier);
> +
>  /**
>   * Module/class support
>   */
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 98191fc590f8..63fbc48a088f 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson "
> @@ -60,6 +61,7 @@ struct vfio_iommu {
>   struct vfio_domain  *external_domain; /* domain for external user */
>   struct mutexlock;
>   struct rb_root  dma_list;
> + struct blocking_notifier_head notifier;
>   boolv2;
>   boolnesting;
>  };
> @@ -561,7 +563,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
>  
>   mutex_lock(>lock);
>  
> - if (!iommu->external_domain) {
> + /* Fail if notifier list is empty */
> + if ((!iommu->external_domain) || (!iommu->notifier.head)) {
>   ret = -EINVAL;
>   goto pin_done;
>   }
> @@ -776,9 +779,9 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>struct vfio_iommu_type1_dma_unmap *unmap)
>  {
>   uint64_t mask;
> - struct vfio_dma *dma;
> + struct vfio_dma *dma, *dma_last = NULL;
>   size_t unmapped = 0;
> - int ret = 0;
> + int ret = 0, retries;
>  
>   mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>  
> @@ -788,7 +791,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>   return -EINVAL;
>  
>   WARN_ON(mask & PAGE_MASK);
> -
> +again:
>   mutex_lock(>lock);
>  
>

Re: [PATCH v14 11/22] vfio iommu: Add blocking notifier to notify DMA_UNMAP

2016-11-16 Thread Jike Song

On 11/17/2016 04:46 AM, Kirti Wankhede wrote:
> Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers
> about DMA_UNMAP.
> Exported two APIs vfio_register_notifier() and vfio_unregister_notifier().
> Notifier should be registered, if external user wants to use
> vfio_pin_pages()/vfio_unpin_pages() APIs to pin/unpin pages.
> Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate
> mappings.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: I5910d0024d6be87f3e8d3e0ca0eaeaaa0b17f271
> ---
>  drivers/vfio/vfio.c | 73 ++
>  drivers/vfio/vfio_iommu_type1.c | 77 
> +
>  include/linux/vfio.h| 12 +++
>  3 files changed, 147 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index bd36c16b0ef2..c850ba324be2 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1901,6 +1901,79 @@ err_unpin_pages:
>  }
>  EXPORT_SYMBOL(vfio_unpin_pages);
>  
> +int vfio_register_notifier(struct device *dev, struct notifier_block *nb)
> +{
> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + ssize_t ret;
> +

Any reason being 'ssize_t' here (and unregister)?

--
Thanks,
Jike
> + if (!dev || !nb)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (IS_ERR(group))
> + return PTR_ERR(group);
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto err_register_nb;
> +
> + container = group->container;
> + down_read(>group_lock);
> +
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->register_notifier))
> + ret = driver->ops->register_notifier(container->iommu_data, nb);
> + else
> + ret = -ENOTTY;
> +
> + up_read(>group_lock);
> + vfio_group_try_dissolve_container(group);
> +
> +err_register_nb:
> + vfio_group_put(group);
> + return ret;
> +}
> +EXPORT_SYMBOL(vfio_register_notifier);
> +
> +int vfio_unregister_notifier(struct device *dev, struct notifier_block *nb)
> +{
> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + ssize_t ret;
> +
> + if (!dev || !nb)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (IS_ERR(group))
> + return PTR_ERR(group);
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto err_unregister_nb;
> +
> + container = group->container;
> + down_read(>group_lock);
> +
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->unregister_notifier))
> + ret = driver->ops->unregister_notifier(container->iommu_data,
> +nb);
> + else
> + ret = -ENOTTY;
> +
> + up_read(>group_lock);
> + vfio_group_try_dissolve_container(group);
> +
> +err_unregister_nb:
> + vfio_group_put(group);
> + return ret;
> +}
> +EXPORT_SYMBOL(vfio_unregister_notifier);
> +
>  /**
>   * Module/class support
>   */
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 98191fc590f8..63fbc48a088f 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson "
> @@ -60,6 +61,7 @@ struct vfio_iommu {
>   struct vfio_domain  *external_domain; /* domain for external user */
>   struct mutexlock;
>   struct rb_root  dma_list;
> + struct blocking_notifier_head notifier;
>   boolv2;
>   boolnesting;
>  };
> @@ -561,7 +563,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
>  
>   mutex_lock(>lock);
>  
> - if (!iommu->external_domain) {
> + /* Fail if notifier list is empty */
> + if ((!iommu->external_domain) || (!iommu->notifier.head)) {
>   ret = -EINVAL;
>   goto pin_done;
>   }
> @@ -776,9 +779,9 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>struct vfio_iommu_type1_dma_unmap *unmap)
>  {
>   uint64_t mask;
> - struct vfio_dma *dma;
> + struct vfio_dma *dma, *dma_last = NULL;
>   size_t unmapped = 0;
> - int ret = 0;
> + int ret = 0, retries;
>  
>   mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>  
> @@ -788,7 +791,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>   return -EINVAL;
>  
>   WARN_ON(mask & PAGE_MASK);
> -
> +again:
>   mutex_lock(>lock);
>  
>   /*
> @@ -844,6 +847,32 @@ static int vfio_dma_do_unmap(struct

Re: [PATCH v12 12/22] vfio: Add notifier callback to parent's ops structure of mdev

2016-11-15 Thread Jike Song

On 11/15/2016 11:19 PM, Alex Williamson wrote:
> On Tue, 15 Nov 2016 14:45:42 +0800
> Jike Song <jike.s...@intel.com> wrote:
> 
>> On 11/14/2016 11:42 PM, Kirti Wankhede wrote:
>>> Add a notifier calback to parent's ops structure of mdev device so that per
>>> device notifer for vfio module is registered through vfio_mdev module.
>>>
>>> Signed-off-by: Kirti Wankhede <kwankh...@nvidia.com>
>>> Signed-off-by: Neo Jia <c...@nvidia.com>
>>> Change-Id: Iafa6f1721aecdd6e50eb93b153b5621e6d29b637
>>> ---
>>>  drivers/vfio/mdev/vfio_mdev.c | 19 +++
>>>  include/linux/mdev.h  |  9 +
>>>  2 files changed, 28 insertions(+)
>>>
>>> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
>>> index ffc36758cb84..1694b1635607 100644
>>> --- a/drivers/vfio/mdev/vfio_mdev.c
>>> +++ b/drivers/vfio/mdev/vfio_mdev.c
>>> @@ -24,6 +24,15 @@
>>>  #define DRIVER_AUTHOR   "NVIDIA Corporation"
>>>  #define DRIVER_DESC "VFIO based driver for Mediated device"
>>>  
>>> +static int vfio_mdev_notifier(struct notifier_block *nb, unsigned long 
>>> action,
>>> + void *data)
>>> +{
>>> +   struct mdev_device *mdev = container_of(nb, struct mdev_device, nb);
>>> +   struct parent_device *parent = mdev->parent;
>>> +
>>> +   return parent->ops->notifier(mdev, action, data);
>>> +}
>>> +
>>>  static int vfio_mdev_open(void *device_data)
>>>  {
>>> struct mdev_device *mdev = device_data;
>>> @@ -40,6 +49,11 @@ static int vfio_mdev_open(void *device_data)
>>> if (ret)
>>> module_put(THIS_MODULE);
>>>  
>>> +   if (likely(parent->ops->notifier)) {
>>> +   mdev->nb.notifier_call = vfio_mdev_notifier;
>>> +   if (vfio_register_notifier(>dev, >nb))
>>> +   pr_err("Failed to register notifier for mdev\n");
>>> +   }  
>>
>> Hi Kirti,
>>
>> Could you please move the notifier registration before parent->ops->open()?
>> as you might know, I'm extending your vfio_register_notifier to also include
>> the attaching/detaching events of vfio_group and kvm.  Basically if 
>> vfio_group
>> not attached to any kvm instance, the parent->ops->open() should return 
>> -ENODEV
>> to indicate the failure, but to know whether kvm is available in open(), the
>> notifier registration should be earlier.
> 
> It seems like you're giving general guidance for how a vendor driver
> open() function should work, yet a hard dependency on KVM should be
> discouraged.  You're making a choice for your vendor driver alone.

I apologize for any confusion, but all I meant here was, if the real
world requires a vendor driver to indicate errors instead of false
success, it has to know some information before making the choice.

> I would also be very cautious about the coherency of signaling the KVM
> association relative to the user of the group.  Is it possible that the
> association of one KVM instance by a user of the group can leak to the
> next user?  Does vfio need to seen a gratuitous un-set of the KVM
> association on group close()? etc.  Thanks,

I failed to see how this is possible, per my understanding the
vfio_group_set_kvm gets called twice (once with kvm, another with NULL)
during kvm's holding the group reference.

Would you elaborate a bit more?


--
Thanks,
Jike

Re: [PATCH v12 12/22] vfio: Add notifier callback to parent's ops structure of mdev

2016-11-15 Thread Jike Song

On 11/15/2016 11:19 PM, Alex Williamson wrote:
> On Tue, 15 Nov 2016 14:45:42 +0800
> Jike Song  wrote:
> 
>> On 11/14/2016 11:42 PM, Kirti Wankhede wrote:
>>> Add a notifier calback to parent's ops structure of mdev device so that per
>>> device notifer for vfio module is registered through vfio_mdev module.
>>>
>>> Signed-off-by: Kirti Wankhede 
>>> Signed-off-by: Neo Jia 
>>> Change-Id: Iafa6f1721aecdd6e50eb93b153b5621e6d29b637
>>> ---
>>>  drivers/vfio/mdev/vfio_mdev.c | 19 +++
>>>  include/linux/mdev.h  |  9 +
>>>  2 files changed, 28 insertions(+)
>>>
>>> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
>>> index ffc36758cb84..1694b1635607 100644
>>> --- a/drivers/vfio/mdev/vfio_mdev.c
>>> +++ b/drivers/vfio/mdev/vfio_mdev.c
>>> @@ -24,6 +24,15 @@
>>>  #define DRIVER_AUTHOR   "NVIDIA Corporation"
>>>  #define DRIVER_DESC "VFIO based driver for Mediated device"
>>>  
>>> +static int vfio_mdev_notifier(struct notifier_block *nb, unsigned long 
>>> action,
>>> + void *data)
>>> +{
>>> +   struct mdev_device *mdev = container_of(nb, struct mdev_device, nb);
>>> +   struct parent_device *parent = mdev->parent;
>>> +
>>> +   return parent->ops->notifier(mdev, action, data);
>>> +}
>>> +
>>>  static int vfio_mdev_open(void *device_data)
>>>  {
>>> struct mdev_device *mdev = device_data;
>>> @@ -40,6 +49,11 @@ static int vfio_mdev_open(void *device_data)
>>> if (ret)
>>> module_put(THIS_MODULE);
>>>  
>>> +   if (likely(parent->ops->notifier)) {
>>> +   mdev->nb.notifier_call = vfio_mdev_notifier;
>>> +   if (vfio_register_notifier(>dev, >nb))
>>> +   pr_err("Failed to register notifier for mdev\n");
>>> +   }  
>>
>> Hi Kirti,
>>
>> Could you please move the notifier registration before parent->ops->open()?
>> as you might know, I'm extending your vfio_register_notifier to also include
>> the attaching/detaching events of vfio_group and kvm.  Basically if 
>> vfio_group
>> not attached to any kvm instance, the parent->ops->open() should return 
>> -ENODEV
>> to indicate the failure, but to know whether kvm is available in open(), the
>> notifier registration should be earlier.
> 
> It seems like you're giving general guidance for how a vendor driver
> open() function should work, yet a hard dependency on KVM should be
> discouraged.  You're making a choice for your vendor driver alone.

I apologize for any confusion, but all I meant here was, if the real
world requires a vendor driver to indicate errors instead of false
success, it has to know some information before making the choice.

> I would also be very cautious about the coherency of signaling the KVM
> association relative to the user of the group.  Is it possible that the
> association of one KVM instance by a user of the group can leak to the
> next user?  Does vfio need to seen a gratuitous un-set of the KVM
> association on group close()? etc.  Thanks,

I failed to see how this is possible, per my understanding the
vfio_group_set_kvm gets called twice (once with kvm, another with NULL)
during kvm's holding the group reference.

Would you elaborate a bit more?


--
Thanks,
Jike

Re: [PATCH v12 12/22] vfio: Add notifier callback to parent's ops structure of mdev

2016-11-15 Thread Jike Song

On 11/15/2016 04:11 PM, Kirti Wankhede wrote:
> 
> 
> On 11/15/2016 12:15 PM, Jike Song wrote:
>> On 11/14/2016 11:42 PM, Kirti Wankhede wrote:
>>> Add a notifier calback to parent's ops structure of mdev device so that per
>>> device notifer for vfio module is registered through vfio_mdev module.
>>>
>>> Signed-off-by: Kirti Wankhede <kwankh...@nvidia.com>
>>> Signed-off-by: Neo Jia <c...@nvidia.com>
>>> Change-Id: Iafa6f1721aecdd6e50eb93b153b5621e6d29b637
>>> ---
>>>  drivers/vfio/mdev/vfio_mdev.c | 19 +++
>>>  include/linux/mdev.h  |  9 +
>>>  2 files changed, 28 insertions(+)
>>>
>>> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
>>> index ffc36758cb84..1694b1635607 100644
>>> --- a/drivers/vfio/mdev/vfio_mdev.c
>>> +++ b/drivers/vfio/mdev/vfio_mdev.c
>>> @@ -24,6 +24,15 @@
>>>  #define DRIVER_AUTHOR   "NVIDIA Corporation"
>>>  #define DRIVER_DESC "VFIO based driver for Mediated device"
>>>  
>>> +static int vfio_mdev_notifier(struct notifier_block *nb, unsigned long 
>>> action,
>>> + void *data)
>>> +{
>>> +   struct mdev_device *mdev = container_of(nb, struct mdev_device, nb);
>>> +   struct parent_device *parent = mdev->parent;
>>> +
>>> +   return parent->ops->notifier(mdev, action, data);
>>> +}
>>> +
>>>  static int vfio_mdev_open(void *device_data)
>>>  {
>>> struct mdev_device *mdev = device_data;
>>> @@ -40,6 +49,11 @@ static int vfio_mdev_open(void *device_data)
>>> if (ret)
>>> module_put(THIS_MODULE);
>>>  
>>> +   if (likely(parent->ops->notifier)) {
>>> +   mdev->nb.notifier_call = vfio_mdev_notifier;
>>> +   if (vfio_register_notifier(>dev, >nb))
>>> +   pr_err("Failed to register notifier for mdev\n");
>>> +   }
>>
>> Hi Kirti,
>>
>> Could you please move the notifier registration before parent->ops->open()?
>> as you might know, I'm extending your vfio_register_notifier to also include
>> the attaching/detaching events of vfio_group and kvm.  Basically if 
>> vfio_group
>> not attached to any kvm instance, the parent->ops->open() should return 
>> -ENODEV
>> to indicate the failure, but to know whether kvm is available in open(), the
>> notifier registration should be earlier.
>>
> 
> Ok. That seem fine to me.
> 

Thanks - and I guess it's also good to move unregister after ->release(),
so that a sequence of register-open-release-unregister guaranteed :)

--
Thanks,
Jike

Re: [PATCH v12 12/22] vfio: Add notifier callback to parent's ops structure of mdev

2016-11-15 Thread Jike Song

On 11/15/2016 04:11 PM, Kirti Wankhede wrote:
> 
> 
> On 11/15/2016 12:15 PM, Jike Song wrote:
>> On 11/14/2016 11:42 PM, Kirti Wankhede wrote:
>>> Add a notifier calback to parent's ops structure of mdev device so that per
>>> device notifer for vfio module is registered through vfio_mdev module.
>>>
>>> Signed-off-by: Kirti Wankhede 
>>> Signed-off-by: Neo Jia 
>>> Change-Id: Iafa6f1721aecdd6e50eb93b153b5621e6d29b637
>>> ---
>>>  drivers/vfio/mdev/vfio_mdev.c | 19 +++
>>>  include/linux/mdev.h  |  9 +
>>>  2 files changed, 28 insertions(+)
>>>
>>> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
>>> index ffc36758cb84..1694b1635607 100644
>>> --- a/drivers/vfio/mdev/vfio_mdev.c
>>> +++ b/drivers/vfio/mdev/vfio_mdev.c
>>> @@ -24,6 +24,15 @@
>>>  #define DRIVER_AUTHOR   "NVIDIA Corporation"
>>>  #define DRIVER_DESC "VFIO based driver for Mediated device"
>>>  
>>> +static int vfio_mdev_notifier(struct notifier_block *nb, unsigned long 
>>> action,
>>> + void *data)
>>> +{
>>> +   struct mdev_device *mdev = container_of(nb, struct mdev_device, nb);
>>> +   struct parent_device *parent = mdev->parent;
>>> +
>>> +   return parent->ops->notifier(mdev, action, data);
>>> +}
>>> +
>>>  static int vfio_mdev_open(void *device_data)
>>>  {
>>> struct mdev_device *mdev = device_data;
>>> @@ -40,6 +49,11 @@ static int vfio_mdev_open(void *device_data)
>>> if (ret)
>>> module_put(THIS_MODULE);
>>>  
>>> +   if (likely(parent->ops->notifier)) {
>>> +   mdev->nb.notifier_call = vfio_mdev_notifier;
>>> +   if (vfio_register_notifier(>dev, >nb))
>>> +   pr_err("Failed to register notifier for mdev\n");
>>> +   }
>>
>> Hi Kirti,
>>
>> Could you please move the notifier registration before parent->ops->open()?
>> as you might know, I'm extending your vfio_register_notifier to also include
>> the attaching/detaching events of vfio_group and kvm.  Basically if 
>> vfio_group
>> not attached to any kvm instance, the parent->ops->open() should return 
>> -ENODEV
>> to indicate the failure, but to know whether kvm is available in open(), the
>> notifier registration should be earlier.
>>
> 
> Ok. That seem fine to me.
> 

Thanks - and I guess it's also good to move unregister after ->release(),
so that a sequence of register-open-release-unregister guaranteed :)

--
Thanks,
Jike

Re: [PATCH v12 12/22] vfio: Add notifier callback to parent's ops structure of mdev

2016-11-14 Thread Jike Song

On 11/14/2016 11:42 PM, Kirti Wankhede wrote:
> Add a notifier calback to parent's ops structure of mdev device so that per
> device notifer for vfio module is registered through vfio_mdev module.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: Iafa6f1721aecdd6e50eb93b153b5621e6d29b637
> ---
>  drivers/vfio/mdev/vfio_mdev.c | 19 +++
>  include/linux/mdev.h  |  9 +
>  2 files changed, 28 insertions(+)
> 
> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
> index ffc36758cb84..1694b1635607 100644
> --- a/drivers/vfio/mdev/vfio_mdev.c
> +++ b/drivers/vfio/mdev/vfio_mdev.c
> @@ -24,6 +24,15 @@
>  #define DRIVER_AUTHOR   "NVIDIA Corporation"
>  #define DRIVER_DESC "VFIO based driver for Mediated device"
>  
> +static int vfio_mdev_notifier(struct notifier_block *nb, unsigned long 
> action,
> +   void *data)
> +{
> + struct mdev_device *mdev = container_of(nb, struct mdev_device, nb);
> + struct parent_device *parent = mdev->parent;
> +
> + return parent->ops->notifier(mdev, action, data);
> +}
> +
>  static int vfio_mdev_open(void *device_data)
>  {
>   struct mdev_device *mdev = device_data;
> @@ -40,6 +49,11 @@ static int vfio_mdev_open(void *device_data)
>   if (ret)
>   module_put(THIS_MODULE);
>  
> + if (likely(parent->ops->notifier)) {
> + mdev->nb.notifier_call = vfio_mdev_notifier;
> + if (vfio_register_notifier(>dev, >nb))
> + pr_err("Failed to register notifier for mdev\n");
> + }

Hi Kirti,

Could you please move the notifier registration before parent->ops->open()?
as you might know, I'm extending your vfio_register_notifier to also include
the attaching/detaching events of vfio_group and kvm.  Basically if vfio_group
not attached to any kvm instance, the parent->ops->open() should return -ENODEV
to indicate the failure, but to know whether kvm is available in open(), the
notifier registration should be earlier.

Of course I can call vfio_register_notifier() from an earlier place to
workaround it, but it doesn't seem a canonical way.

--
Thanks,
Jike

>   return ret;
>  }
>  
> @@ -48,6 +62,11 @@ static void vfio_mdev_release(void *device_data)
>   struct mdev_device *mdev = device_data;
>   struct parent_device *parent = mdev->parent;
>  
> + if (likely(parent->ops->notifier)) {
> + if (vfio_unregister_notifier(>dev, >nb))
> + pr_err("Failed to unregister notifier for mdev\n");
> + }
> +
>   if (likely(parent->ops->release))
>   parent->ops->release(mdev);
>  
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index 4900cc472364..665afe0a4c31 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -37,6 +37,7 @@ struct mdev_device {
>   struct kref ref;
>   struct list_headnext;
>   struct kobject  *type_kobj;
> + struct notifier_block   nb;
>  };
>  
>  /**
> @@ -85,6 +86,12 @@ struct mdev_device {
>   * @mmap:mmap callback
>   *   @mdev: mediated device structure
>   *   @vma: vma structure
> + * @notifer: Notifier callback, currently only for
> + *   VFIO_IOMMU_NOTIFY_DMA_UNMAP action notified duing
> + *   DMA_UNMAP call on mapped iova range.
> + *   @mdev: mediated device structure
> + *   @action: Action for which notifier is called
> + *   @data: Data associated with the notifier
>   * Parent device that support mediated device should be registered with mdev
>   * module with parent_ops structure.
>   **/
> @@ -106,6 +113,8 @@ struct parent_ops {
>   ssize_t (*ioctl)(struct mdev_device *mdev, unsigned int cmd,
>unsigned long arg);
>   int (*mmap)(struct mdev_device *mdev, struct vm_area_struct *vma);
> + int (*notifier)(struct mdev_device *mdev, unsigned long action,
> + void *data);
>  };
>  
>  /* interface for exporting mdev supported type attributes */
>

Re: [PATCH v12 12/22] vfio: Add notifier callback to parent's ops structure of mdev

2016-11-14 Thread Jike Song

On 11/14/2016 11:42 PM, Kirti Wankhede wrote:
> Add a notifier calback to parent's ops structure of mdev device so that per
> device notifer for vfio module is registered through vfio_mdev module.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: Iafa6f1721aecdd6e50eb93b153b5621e6d29b637
> ---
>  drivers/vfio/mdev/vfio_mdev.c | 19 +++
>  include/linux/mdev.h  |  9 +
>  2 files changed, 28 insertions(+)
> 
> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
> index ffc36758cb84..1694b1635607 100644
> --- a/drivers/vfio/mdev/vfio_mdev.c
> +++ b/drivers/vfio/mdev/vfio_mdev.c
> @@ -24,6 +24,15 @@
>  #define DRIVER_AUTHOR   "NVIDIA Corporation"
>  #define DRIVER_DESC "VFIO based driver for Mediated device"
>  
> +static int vfio_mdev_notifier(struct notifier_block *nb, unsigned long 
> action,
> +   void *data)
> +{
> + struct mdev_device *mdev = container_of(nb, struct mdev_device, nb);
> + struct parent_device *parent = mdev->parent;
> +
> + return parent->ops->notifier(mdev, action, data);
> +}
> +
>  static int vfio_mdev_open(void *device_data)
>  {
>   struct mdev_device *mdev = device_data;
> @@ -40,6 +49,11 @@ static int vfio_mdev_open(void *device_data)
>   if (ret)
>   module_put(THIS_MODULE);
>  
> + if (likely(parent->ops->notifier)) {
> + mdev->nb.notifier_call = vfio_mdev_notifier;
> + if (vfio_register_notifier(>dev, >nb))
> + pr_err("Failed to register notifier for mdev\n");
> + }

Hi Kirti,

Could you please move the notifier registration before parent->ops->open()?
as you might know, I'm extending your vfio_register_notifier to also include
the attaching/detaching events of vfio_group and kvm.  Basically if vfio_group
not attached to any kvm instance, the parent->ops->open() should return -ENODEV
to indicate the failure, but to know whether kvm is available in open(), the
notifier registration should be earlier.

Of course I can call vfio_register_notifier() from an earlier place to
workaround it, but it doesn't seem a canonical way.

--
Thanks,
Jike

>   return ret;
>  }
>  
> @@ -48,6 +62,11 @@ static void vfio_mdev_release(void *device_data)
>   struct mdev_device *mdev = device_data;
>   struct parent_device *parent = mdev->parent;
>  
> + if (likely(parent->ops->notifier)) {
> + if (vfio_unregister_notifier(>dev, >nb))
> + pr_err("Failed to unregister notifier for mdev\n");
> + }
> +
>   if (likely(parent->ops->release))
>   parent->ops->release(mdev);
>  
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index 4900cc472364..665afe0a4c31 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -37,6 +37,7 @@ struct mdev_device {
>   struct kref ref;
>   struct list_headnext;
>   struct kobject  *type_kobj;
> + struct notifier_block   nb;
>  };
>  
>  /**
> @@ -85,6 +86,12 @@ struct mdev_device {
>   * @mmap:mmap callback
>   *   @mdev: mediated device structure
>   *   @vma: vma structure
> + * @notifer: Notifier callback, currently only for
> + *   VFIO_IOMMU_NOTIFY_DMA_UNMAP action notified duing
> + *   DMA_UNMAP call on mapped iova range.
> + *   @mdev: mediated device structure
> + *   @action: Action for which notifier is called
> + *   @data: Data associated with the notifier
>   * Parent device that support mediated device should be registered with mdev
>   * module with parent_ops structure.
>   **/
> @@ -106,6 +113,8 @@ struct parent_ops {
>   ssize_t (*ioctl)(struct mdev_device *mdev, unsigned int cmd,
>unsigned long arg);
>   int (*mmap)(struct mdev_device *mdev, struct vm_area_struct *vma);
> + int (*notifier)(struct mdev_device *mdev, unsigned long action,
> + void *data);
>  };
>  
>  /* interface for exporting mdev supported type attributes */
>

Re: [PATCH v11 10/22] vfio iommu type1: Add support for mediated devices

2016-11-07 Thread Jike Song

On 11/08/2016 07:16 AM, Alex Williamson wrote:
> On Sat, 5 Nov 2016 02:40:44 +0530
> Kirti Wankhede  wrote:
> 
>> VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
>> Mediated device only uses IOMMU APIs, the underlying hardware can be
>> managed by an IOMMU domain.
>>
>> Aim of this change is:
>> - To use most of the code of TYPE1 IOMMU driver for mediated devices
>> - To support direct assigned device and mediated device in single module
>>
>> This change adds pin and unpin support for mediated device to TYPE1 IOMMU
>> backend module. More details:
>> - vfio_pin_pages() callback here uses task and address space of vfio_dma,
>>   that is, of the process who mapped that iova range.
>> - Added pfn_list tracking logic to address space structure. All pages
>>   pinned through this interface are trached in its address space.
>   ^ k
> --|
> 
>> - Pinned pages list is used to verify unpinning request and to unpin
>>   remaining pages while detaching the group for that device.
>> - Page accounting is updated to account in its address space where the
>>   pages are pinned/unpinned.
>> -  Accouting for mdev device is only done if there is no iommu capable
>>   domain in the container. When there is a direct device assigned to the
>>   container and that domain is iommu capable, all pages are already pinned
>>   during DMA_MAP.
>> - Page accouting is updated on hot plug and unplug mdev device and pass
>>   through device.
>>
>> Tested by assigning below combinations of devices to a single VM:
>> - GPU pass through only
>> - vGPU device only
>> - One GPU pass through and one vGPU device
>> - Linux VM hot plug and unplug vGPU device while GPU pass through device
>>   exist
>> - Linux VM hot plug and unplug GPU pass through device while vGPU device
>>   exist
>>
>> Signed-off-by: Kirti Wankhede 
>> Signed-off-by: Neo Jia 
>> Change-Id: I295d6f0f2e0579b8d9882bfd8fd5a4194b97bd9a
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 538 
>> +---
>>  1 file changed, 500 insertions(+), 38 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 8d64528dcc22..e511073446a0 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -36,6 +36,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #define DRIVER_VERSION  "0.2"
>>  #define DRIVER_AUTHOR   "Alex Williamson "
>> @@ -56,6 +57,7 @@ MODULE_PARM_DESC(disable_hugepages,
>>  struct vfio_iommu {
>>  struct list_headdomain_list;
>>  struct list_headaddr_space_list;
>> +struct vfio_domain  *external_domain; /* domain for external user */
>>  struct mutexlock;
>>  struct rb_root  dma_list;
>>  boolv2;
>> @@ -67,6 +69,9 @@ struct vfio_addr_space {
>>  struct mm_struct*mm;
>>  struct list_headnext;
>>  atomic_tref_count;
>> +/* external user pinned pfns */
>> +struct rb_root  pfn_list;   /* pinned Host pfn list */
>> +struct mutexpfn_list_lock;  /* mutex for pfn_list */
>>  };
>>  
>>  struct vfio_domain {
>> @@ -83,6 +88,7 @@ struct vfio_dma {
>>  unsigned long   vaddr;  /* Process virtual addr */
>>  size_t  size;   /* Map size (bytes) */
>>  int prot;   /* IOMMU_READ/WRITE */
>> +booliommu_mapped;
>>  struct vfio_addr_space  *addr_space;
>>  struct task_struct  *task;
>>  boolmlock_cap;
>> @@ -94,6 +100,19 @@ struct vfio_group {
>>  };
>>  
>>  /*
>> + * Guest RAM pinning working set or DMA target
>> + */
>> +struct vfio_pfn {
>> +struct rb_node  node;
>> +unsigned long   pfn;/* Host pfn */
>> +int prot;
>> +atomic_tref_count;
>> +};
>> +
>> +#define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \
>> +(!list_empty(>domain_list))
>> +
>> +/*
>>   * This code handles mapping and unmapping of user data buffers
>>   * into DMA'ble space using the IOMMU
>>   */
>> @@ -153,6 +172,93 @@ static struct vfio_addr_space 
>> *vfio_find_addr_space(struct vfio_iommu *iommu,
>>  return NULL;
>>  }
>>  
>> +/*
>> + * Helper Functions for host pfn list
>> + */
>> +static struct vfio_pfn *vfio_find_pfn(struct vfio_addr_space *addr_space,
>> +  unsigned long pfn)
>> +{
>> +struct vfio_pfn *vpfn;
>> +struct rb_node *node = addr_space->pfn_list.rb_node;
>> +
>> +while (node) {
>> +vpfn = rb_entry(node, struct vfio_pfn, node);
>> +
>> +if (pfn < vpfn->pfn)
>> +node =

Re: [PATCH v11 10/22] vfio iommu type1: Add support for mediated devices

2016-11-07 Thread Jike Song

On 11/08/2016 07:16 AM, Alex Williamson wrote:
> On Sat, 5 Nov 2016 02:40:44 +0530
> Kirti Wankhede  wrote:
> 
>> VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
>> Mediated device only uses IOMMU APIs, the underlying hardware can be
>> managed by an IOMMU domain.
>>
>> Aim of this change is:
>> - To use most of the code of TYPE1 IOMMU driver for mediated devices
>> - To support direct assigned device and mediated device in single module
>>
>> This change adds pin and unpin support for mediated device to TYPE1 IOMMU
>> backend module. More details:
>> - vfio_pin_pages() callback here uses task and address space of vfio_dma,
>>   that is, of the process who mapped that iova range.
>> - Added pfn_list tracking logic to address space structure. All pages
>>   pinned through this interface are trached in its address space.
>   ^ k
> --|
> 
>> - Pinned pages list is used to verify unpinning request and to unpin
>>   remaining pages while detaching the group for that device.
>> - Page accounting is updated to account in its address space where the
>>   pages are pinned/unpinned.
>> -  Accouting for mdev device is only done if there is no iommu capable
>>   domain in the container. When there is a direct device assigned to the
>>   container and that domain is iommu capable, all pages are already pinned
>>   during DMA_MAP.
>> - Page accouting is updated on hot plug and unplug mdev device and pass
>>   through device.
>>
>> Tested by assigning below combinations of devices to a single VM:
>> - GPU pass through only
>> - vGPU device only
>> - One GPU pass through and one vGPU device
>> - Linux VM hot plug and unplug vGPU device while GPU pass through device
>>   exist
>> - Linux VM hot plug and unplug GPU pass through device while vGPU device
>>   exist
>>
>> Signed-off-by: Kirti Wankhede 
>> Signed-off-by: Neo Jia 
>> Change-Id: I295d6f0f2e0579b8d9882bfd8fd5a4194b97bd9a
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 538 
>> +---
>>  1 file changed, 500 insertions(+), 38 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 8d64528dcc22..e511073446a0 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -36,6 +36,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #define DRIVER_VERSION  "0.2"
>>  #define DRIVER_AUTHOR   "Alex Williamson "
>> @@ -56,6 +57,7 @@ MODULE_PARM_DESC(disable_hugepages,
>>  struct vfio_iommu {
>>  struct list_headdomain_list;
>>  struct list_headaddr_space_list;
>> +struct vfio_domain  *external_domain; /* domain for external user */
>>  struct mutexlock;
>>  struct rb_root  dma_list;
>>  boolv2;
>> @@ -67,6 +69,9 @@ struct vfio_addr_space {
>>  struct mm_struct*mm;
>>  struct list_headnext;
>>  atomic_tref_count;
>> +/* external user pinned pfns */
>> +struct rb_root  pfn_list;   /* pinned Host pfn list */
>> +struct mutexpfn_list_lock;  /* mutex for pfn_list */
>>  };
>>  
>>  struct vfio_domain {
>> @@ -83,6 +88,7 @@ struct vfio_dma {
>>  unsigned long   vaddr;  /* Process virtual addr */
>>  size_t  size;   /* Map size (bytes) */
>>  int prot;   /* IOMMU_READ/WRITE */
>> +booliommu_mapped;
>>  struct vfio_addr_space  *addr_space;
>>  struct task_struct  *task;
>>  boolmlock_cap;
>> @@ -94,6 +100,19 @@ struct vfio_group {
>>  };
>>  
>>  /*
>> + * Guest RAM pinning working set or DMA target
>> + */
>> +struct vfio_pfn {
>> +struct rb_node  node;
>> +unsigned long   pfn;/* Host pfn */
>> +int prot;
>> +atomic_tref_count;
>> +};
>> +
>> +#define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \
>> +(!list_empty(>domain_list))
>> +
>> +/*
>>   * This code handles mapping and unmapping of user data buffers
>>   * into DMA'ble space using the IOMMU
>>   */
>> @@ -153,6 +172,93 @@ static struct vfio_addr_space 
>> *vfio_find_addr_space(struct vfio_iommu *iommu,
>>  return NULL;
>>  }
>>  
>> +/*
>> + * Helper Functions for host pfn list
>> + */
>> +static struct vfio_pfn *vfio_find_pfn(struct vfio_addr_space *addr_space,
>> +  unsigned long pfn)
>> +{
>> +struct vfio_pfn *vpfn;
>> +struct rb_node *node = addr_space->pfn_list.rb_node;
>> +
>> +while (node) {
>> +vpfn = rb_entry(node, struct vfio_pfn, node);
>> +
>> +if (pfn < vpfn->pfn)
>> +node = node->rb_left;
>> +else if (pfn > vpfn->pfn)
>> +node =

Re: [ANNOUNCE] 2016-Q3 release of KVMGT (Was Re: KVMGT - the implementation of ...)

2016-11-06 Thread Jike Song

Hi all,

While spending efforts for upstreaming, we are pleased to announce another 
update of Intel GVT-g for KVM.

Intel GVT-g for KVM (a.k.a. KVMGT) is a full GPU virtualization solution with 
mediated pass-through, starting from 5th generation Intel Core(TM) processors 
with Intel Graphics processors.  A virtual GPU instance is maintained for each 
VM, with part of performance critical resources directly assigned. The 
capability of running native graphics driver inside a VM, without hypervisor 
intervention in performance critical paths, achieves a good balance among 
performance, feature, and sharing capability.

Repositories:

-Kernel: https://github.com/01org/igvtg-kernel (2016q3-4.3.0 branch)
-Qemu: https://github.com/01org/igvtg-qemu (2016q3-2.3.0 branch)

This update consists of:
-Preliminary support new platform: KabyLake, for windows OS, it only 
supports Win10 RedStone 64 bit.
-Windows 10 RedStone guest Support
-Windows Guest QoS preliminary support:  Administrators now are able to 
control the maximum amount of vGPU resource to be consumed by each VM from 
value 1% ~ 99%"
-Display virtualization preliminary support: Besides the tracking of 
display register visit in guest VM, removing irrelative display pipeline info 
between host and guest VM
-Live Migration and savevm/restorevm preliminary support on BDW with 2D/3D 
workload running

Known issues:
-   At least 2GB memory is suggested for Guest Virtual Machine (win7-32/64, 
win8.1-64, win10-64) to run most 3D workloads
-   Windows8 and later Windows fast boot is not supported, the workaround is to 
disable power S3/S4 in HVM file by adding "acpi_S3=0, acpi_S4=0"
-   Sometimes when dom0 and guest has heavy workload, i915 in dom0 will trigger 
a false-alarmed TDR. The workaround is to disable dom0 hangcheck in dom0 grub 
file by adding "i915.enable_hangcheck=0"
-   Stability: When QoS feature is enabled, Windows guest full GPU reset is 
often trigger during MTBF test.  This bug will be fixed in next release
-   Windows guest running OpenCL allocations occurs to host crash; the 
workaround is to disable logd in dom0 grub file by adding "i915.logd_enable=0"

Please subscribe the mailing list: https://lists.01.org/mailman/listinfo/igvt-g

Official iGVT-g portal: https://01.org/igvt-g

More information about background, architecture and others about Intel GVT-g, 
can be found at:

http://www.linux-kvm.org/images/f/f3/01x08b-KVMGT-a.pdf
https://www.usenix.org/conference/atc14/technical-sessions/presentation/tian


Note:
The KVMGT project should be considered a work in progress. As such it is not a 
complete product nor should it be considered one. Extra care should be taken 
when testing and configuring a system to use the KVMGT project.

--
Thanks,
Jike

On 07/20/2016 12:52 PM, Jike Song wrote:
> Hi all,
> 
> We are pleased to announce another update of Intel GVT-g for KVM.
> 
> Intel GVT-g for KVM (a.k.a. KVMGT) is a full GPU virtualization solution with 
> mediated pass-through, starting from 5th generation Intel Core™ processors 
> with Intel Graphics processors.  A virtual GPU instance is maintained for 
> each VM, with part of performance critical resources directly assigned. The 
> capability of running native graphics driver inside a VM, without hypervisor 
> intervention in performance critical paths, achieves a good balance among 
> performance, feature, and sharing capability.
> 
> Repositories:
> 
>- Kernel: https://github.com/01org/igvtg-kernel (2016q2-4.3.0 branch)
>- Qemu: https://github.com/01org/igvtg-qemu (2016q2-2.3.0 branch)
> 
> This update consists of:
>- KVMGT stable release for Xeon E3 v4 (Broadwell), E3 v5(Skylake), Intel 
> Core™ processors 5th generation (Boadwell) , 6th generation (Skylake)
>- 2D/3D/Media workloads can run simultaneously in multiple guests
> 
> Known issues:
>- At least 2GB memory is suggested for Guest Virtual Machine (VM) to run 
> most 3D workloads.
>- Using Windows Media Player play videos may cause host crash. Using VLC 
> to play .ogg file may cause mosaic or slow response.
>- Suggest to X window mode like xinit instead of lightdm to launch host if 
> running heavy workload in both guest and host for more than 6 hours.
>- Change i915.preemption_policy=3 in host kernel cmdline, if you see 
> problem when running heavy 3D workloads in multiple Guests (>=3) in some 
> extreme stress configuration.
> 
> 
> Please subscribe to join the mailing list:
>- https://lists.01.org/mailman/listinfo/igvt-g
> 
> Official iGVT-g portal:
>- https://01.org/igvt-g
> 
> More information about background, architecture and others about Intel GVT-g, 
> can be found at:
> 
> http://www.linux-kvm.org/images/f/f3/01x08b-KVMGT-a.pdf
> 
> https://www.usenix.

Re: [ANNOUNCE] 2016-Q3 release of KVMGT (Was Re: KVMGT - the implementation of ...)

2016-11-06 Thread Jike Song

Hi all,

While spending efforts for upstreaming, we are pleased to announce another 
update of Intel GVT-g for KVM.

Intel GVT-g for KVM (a.k.a. KVMGT) is a full GPU virtualization solution with 
mediated pass-through, starting from 5th generation Intel Core(TM) processors 
with Intel Graphics processors.  A virtual GPU instance is maintained for each 
VM, with part of performance critical resources directly assigned. The 
capability of running native graphics driver inside a VM, without hypervisor 
intervention in performance critical paths, achieves a good balance among 
performance, feature, and sharing capability.

Repositories:

-Kernel: https://github.com/01org/igvtg-kernel (2016q3-4.3.0 branch)
-Qemu: https://github.com/01org/igvtg-qemu (2016q3-2.3.0 branch)

This update consists of:
-Preliminary support new platform: KabyLake, for windows OS, it only 
supports Win10 RedStone 64 bit.
-Windows 10 RedStone guest Support
-Windows Guest QoS preliminary support:  Administrators now are able to 
control the maximum amount of vGPU resource to be consumed by each VM from 
value 1% ~ 99%"
-Display virtualization preliminary support: Besides the tracking of 
display register visit in guest VM, removing irrelative display pipeline info 
between host and guest VM
-Live Migration and savevm/restorevm preliminary support on BDW with 2D/3D 
workload running

Known issues:
-   At least 2GB memory is suggested for Guest Virtual Machine (win7-32/64, 
win8.1-64, win10-64) to run most 3D workloads
-   Windows8 and later Windows fast boot is not supported, the workaround is to 
disable power S3/S4 in HVM file by adding "acpi_S3=0, acpi_S4=0"
-   Sometimes when dom0 and guest has heavy workload, i915 in dom0 will trigger 
a false-alarmed TDR. The workaround is to disable dom0 hangcheck in dom0 grub 
file by adding "i915.enable_hangcheck=0"
-   Stability: When QoS feature is enabled, Windows guest full GPU reset is 
often trigger during MTBF test.  This bug will be fixed in next release
-   Windows guest running OpenCL allocations occurs to host crash; the 
workaround is to disable logd in dom0 grub file by adding "i915.logd_enable=0"

Please subscribe the mailing list: https://lists.01.org/mailman/listinfo/igvt-g

Official iGVT-g portal: https://01.org/igvt-g

More information about background, architecture and others about Intel GVT-g, 
can be found at:

http://www.linux-kvm.org/images/f/f3/01x08b-KVMGT-a.pdf
https://www.usenix.org/conference/atc14/technical-sessions/presentation/tian


Note:
The KVMGT project should be considered a work in progress. As such it is not a 
complete product nor should it be considered one. Extra care should be taken 
when testing and configuring a system to use the KVMGT project.

--
Thanks,
Jike

On 07/20/2016 12:52 PM, Jike Song wrote:
> Hi all,
> 
> We are pleased to announce another update of Intel GVT-g for KVM.
> 
> Intel GVT-g for KVM (a.k.a. KVMGT) is a full GPU virtualization solution with 
> mediated pass-through, starting from 5th generation Intel Core™ processors 
> with Intel Graphics processors.  A virtual GPU instance is maintained for 
> each VM, with part of performance critical resources directly assigned. The 
> capability of running native graphics driver inside a VM, without hypervisor 
> intervention in performance critical paths, achieves a good balance among 
> performance, feature, and sharing capability.
> 
> Repositories:
> 
>- Kernel: https://github.com/01org/igvtg-kernel (2016q2-4.3.0 branch)
>- Qemu: https://github.com/01org/igvtg-qemu (2016q2-2.3.0 branch)
> 
> This update consists of:
>- KVMGT stable release for Xeon E3 v4 (Broadwell), E3 v5(Skylake), Intel 
> Core™ processors 5th generation (Boadwell) , 6th generation (Skylake)
>- 2D/3D/Media workloads can run simultaneously in multiple guests
> 
> Known issues:
>- At least 2GB memory is suggested for Guest Virtual Machine (VM) to run 
> most 3D workloads.
>- Using Windows Media Player play videos may cause host crash. Using VLC 
> to play .ogg file may cause mosaic or slow response.
>- Suggest to X window mode like xinit instead of lightdm to launch host if 
> running heavy workload in both guest and host for more than 6 hours.
>- Change i915.preemption_policy=3 in host kernel cmdline, if you see 
> problem when running heavy 3D workloads in multiple Guests (>=3) in some 
> extreme stress configuration.
> 
> 
> Please subscribe to join the mailing list:
>- https://lists.01.org/mailman/listinfo/igvt-g
> 
> Official iGVT-g portal:
>- https://01.org/igvt-g
> 
> More information about background, architecture and others about Intel GVT-g, 
> can be found at:
> 
> http://www.linux-kvm.org/images/f/f3/01x08b-KVMGT-a.pdf
> 
> https://www.usenix.

Re: [Intel-gfx] [Announcement] 2016-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

2016-11-06 Thread Jike Song

Hi all,

We are pleased to announce another update of Intel GVT-g for Xen.

Intel GVT-g is a full GPU virtualization solution with mediated pass-through, 
starting from 4th generation Intel Core(TM) processors with Intel Graphics 
processors. A virtual GPU instance is maintained for each VM, with part of 
performance critical resources directly assigned. The capability of running 
native graphics driver inside a VM, without hypervisor intervention in 
performance critical paths, achieves a good balance among performance, feature, 
and sharing capability. Xen is currently supported on Intel Processor Graphics 
(a.k.a. XenGT).


Repositories

-Xen: https://github.com/01org/igvtg-xen (2016q3-4.6 branch)
-Kernel: https://github.com/01org/igvtg-kernel (2016q3-4.3.0 branch)
-Qemu: https://github.com/01org/igvtg-qemu (2016q3-2.3.0 branch)


This update consists of:

-Preliminary support new platform: 7th generation Intel® Core™ processors. 
For windows OS, it only supports Win10 RedStone 64 bit.

-Windows 10 RedStone guest Support

-Windows Guest QoS preliminary support:  Administrators now are able to 
control the maximum amount of vGPU resource to be consumed by each VM from 
value 1% ~ 99%”

-Display virtualization preliminary support: Besides the tracking of 
display register visit in guest VM, removing irrelative display pipeline info 
between host and guest VM

-Live Migration and savevm/restorevm preliminary support on BDW with 2D/3D 
workload running



Known issues:

-   At least 2GB memory is suggested for Guest Virtual Machine (win7-32/64, 
win8.1-64, win10-64) to run most 3D workloads

-   Windows8 and later Windows fast boot is not supported, the workaround is to 
disable power S3/S4 in HVM file by adding “acpi_S3=0, acpi_S4=0”

-   Sometimes when dom0 and guest has heavy workload, i915 in dom0 will trigger 
a false-alarmed TDR. The workaround is to disable dom0 hangcheck in dom0 grub 
file by adding “i915.enable_hangcheck=0”

-   Stability: When QoS feature is enabled, Windows guest full GPU reset is 
often trigger during MTBF test.  This bug will be fixed in next release

-   Windows guest running OpenCL allocations occurs to host crash; the 
workaround is to disable logd in dom0 grub file by adding “i915. logd_enable =0”


Next update will be around early Jan, 2017.


GVT-g project portal: https://01.org/igvt-g
Please subscribe mailing list: https://lists.01.org/mailman/listinfo/igvt-g


More information about background, architecture and others about Intel GVT-g, 
can be found at:

https://01.org/igvt-g
https://www.usenix.org/conference/atc14/technical-sessions/presentation/tian

http://events.linuxfoundation.org/sites/events/files/slides/XenGT-Xen%20Summit-v7_0.pdf

http://events.linuxfoundation.org/sites/events/files/slides/XenGT-Xen%20Summit-REWRITE%203RD%20v4.pdf
https://01.org/xen/blogs/srclarkx/2013/graphics-virtualization-xengt


Note: The XenGT project should be considered a work in progress. As such it is 
not a complete product nor should it be considered one. Extra care should be 
taken when testing and configuring a system to use the XenGT project.

--
Thanks,
Jike

On 07/22/2016 01:42 PM, Jike Song wrote:
> Hi all,
> 
> We are pleased to announce another update of Intel GVT-g for Xen.
> 
> Intel GVT-g is a full GPU virtualization solution with mediated pass-through, 
> starting from 4th generation Intel Core(TM) processors with Intel Graphics 
> processors. A virtual GPU instance is maintained for each VM, with part of 
> performance critical resources directly assigned. The capability of running 
> native graphics driver inside a VM, without hypervisor intervention in 
> performance critical paths, achieves a good balance among performance, 
> feature, and sharing capability. Xen is currently supported on Intel 
> Processor Graphics (a.k.a. XenGT).
> 
> Repositories
> -Xen: https://github.com/01org/igvtg-xen (2016q2-4.6 branch)
> -Kernel: https://github.com/01org/igvtg-kernel (2016q2-4.3.0 branch)
> -Qemu: https://github.com/01org/igvtg-qemu (2016q2-2.3.0 branch)
> 
> This update consists of:
> -Support Windows 10 guest
> -Support Windows Graphics driver installation on both Windows Normal mode 
> and Safe mode
> 
> Known issues:
> -   At least 2GB memory is suggested for Guest Virtual Machine (VM) to run 
> most 3D workloads
> -   Dom0 S3 related feature is not supported
> -   Windows 8 and later versions: fast boot is not supported, the workaround 
> is to disable power S3/S4 in HVM file by adding "acpi_S3=0, acpi_S4=0"
> -   Using Windows Media Player play videos may cause host crash. Using VLC to 
> play .ogg file may cause mosaic or slow response.
> -   Sometimes when both dom0 and guest have heavy workloads, i915 in dom0 
> will trigger a false graphics reset,
> the workaround is to disab

Re: [Intel-gfx] [Announcement] 2016-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

2016-11-06 Thread Jike Song

Hi all,

We are pleased to announce another update of Intel GVT-g for Xen.

Intel GVT-g is a full GPU virtualization solution with mediated pass-through, 
starting from 4th generation Intel Core(TM) processors with Intel Graphics 
processors. A virtual GPU instance is maintained for each VM, with part of 
performance critical resources directly assigned. The capability of running 
native graphics driver inside a VM, without hypervisor intervention in 
performance critical paths, achieves a good balance among performance, feature, 
and sharing capability. Xen is currently supported on Intel Processor Graphics 
(a.k.a. XenGT).


Repositories

-Xen: https://github.com/01org/igvtg-xen (2016q3-4.6 branch)
-Kernel: https://github.com/01org/igvtg-kernel (2016q3-4.3.0 branch)
-Qemu: https://github.com/01org/igvtg-qemu (2016q3-2.3.0 branch)


This update consists of:

-Preliminary support new platform: 7th generation Intel® Core™ processors. 
For windows OS, it only supports Win10 RedStone 64 bit.

-Windows 10 RedStone guest Support

-Windows Guest QoS preliminary support:  Administrators now are able to 
control the maximum amount of vGPU resource to be consumed by each VM from 
value 1% ~ 99%”

-Display virtualization preliminary support: Besides the tracking of 
display register visit in guest VM, removing irrelative display pipeline info 
between host and guest VM

-Live Migration and savevm/restorevm preliminary support on BDW with 2D/3D 
workload running



Known issues:

-   At least 2GB memory is suggested for Guest Virtual Machine (win7-32/64, 
win8.1-64, win10-64) to run most 3D workloads

-   Windows8 and later Windows fast boot is not supported, the workaround is to 
disable power S3/S4 in HVM file by adding “acpi_S3=0, acpi_S4=0”

-   Sometimes when dom0 and guest has heavy workload, i915 in dom0 will trigger 
a false-alarmed TDR. The workaround is to disable dom0 hangcheck in dom0 grub 
file by adding “i915.enable_hangcheck=0”

-   Stability: When QoS feature is enabled, Windows guest full GPU reset is 
often trigger during MTBF test.  This bug will be fixed in next release

-   Windows guest running OpenCL allocations occurs to host crash; the 
workaround is to disable logd in dom0 grub file by adding “i915. logd_enable =0”


Next update will be around early Jan, 2017.


GVT-g project portal: https://01.org/igvt-g
Please subscribe mailing list: https://lists.01.org/mailman/listinfo/igvt-g


More information about background, architecture and others about Intel GVT-g, 
can be found at:

https://01.org/igvt-g
https://www.usenix.org/conference/atc14/technical-sessions/presentation/tian

http://events.linuxfoundation.org/sites/events/files/slides/XenGT-Xen%20Summit-v7_0.pdf

http://events.linuxfoundation.org/sites/events/files/slides/XenGT-Xen%20Summit-REWRITE%203RD%20v4.pdf
https://01.org/xen/blogs/srclarkx/2013/graphics-virtualization-xengt


Note: The XenGT project should be considered a work in progress. As such it is 
not a complete product nor should it be considered one. Extra care should be 
taken when testing and configuring a system to use the XenGT project.

--
Thanks,
Jike

On 07/22/2016 01:42 PM, Jike Song wrote:
> Hi all,
> 
> We are pleased to announce another update of Intel GVT-g for Xen.
> 
> Intel GVT-g is a full GPU virtualization solution with mediated pass-through, 
> starting from 4th generation Intel Core(TM) processors with Intel Graphics 
> processors. A virtual GPU instance is maintained for each VM, with part of 
> performance critical resources directly assigned. The capability of running 
> native graphics driver inside a VM, without hypervisor intervention in 
> performance critical paths, achieves a good balance among performance, 
> feature, and sharing capability. Xen is currently supported on Intel 
> Processor Graphics (a.k.a. XenGT).
> 
> Repositories
> -Xen: https://github.com/01org/igvtg-xen (2016q2-4.6 branch)
> -Kernel: https://github.com/01org/igvtg-kernel (2016q2-4.3.0 branch)
> -Qemu: https://github.com/01org/igvtg-qemu (2016q2-2.3.0 branch)
> 
> This update consists of:
> -Support Windows 10 guest
> -Support Windows Graphics driver installation on both Windows Normal mode 
> and Safe mode
> 
> Known issues:
> -   At least 2GB memory is suggested for Guest Virtual Machine (VM) to run 
> most 3D workloads
> -   Dom0 S3 related feature is not supported
> -   Windows 8 and later versions: fast boot is not supported, the workaround 
> is to disable power S3/S4 in HVM file by adding "acpi_S3=0, acpi_S4=0"
> -   Using Windows Media Player play videos may cause host crash. Using VLC to 
> play .ogg file may cause mosaic or slow response.
> -   Sometimes when both dom0 and guest have heavy workloads, i915 in dom0 
> will trigger a false graphics reset,
> the workaround is to disab

Re: [PATCH v10 08/19] vfio iommu type1: Add find_iommu_group() function

2016-11-02 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> Add find_iommu_group()
> 
> Signed-off-by: Kirti Wankhede <kwankh...@nvidia.com>
> Signed-off-by: Neo Jia <c...@nvidia.com>
> Change-Id: I9d372f1ebe9eb01a5a21374b8a2b03f7df73601f
> ---
>  drivers/vfio/vfio_iommu_type1.c | 58 
> -
>  1 file changed, 34 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 3d916b965492..861ac2a1b0c3 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -745,11 +745,24 @@ static void vfio_test_domain_fgsp(struct vfio_domain 
> *domain)
>   __free_pages(pages, order);
>  }
>  
> +static struct vfio_group *find_iommu_group(struct vfio_domain *domain,
> +struct iommu_group *iommu_group)
> +{
> + struct vfio_group *g;
> +
> + list_for_each_entry(g, >group_list, next) {
> + if (g->iommu_group == iommu_group)
> + return g;
> + }
> +
> + return NULL;
> +}
> +
>  static int vfio_iommu_type1_attach_group(void *iommu_data,
>struct iommu_group *iommu_group)
>  {
>   struct vfio_iommu *iommu = iommu_data;
> - struct vfio_group *group, *g;
> + struct vfio_group *group;
>   struct vfio_domain *domain, *d;
>   struct bus_type *bus = NULL;
>   int ret;
> @@ -757,10 +770,7 @@ static int vfio_iommu_type1_attach_group(void 
> *iommu_data,
>   mutex_lock(>lock);
>  
>   list_for_each_entry(d, >domain_list, next) {
> - list_for_each_entry(g, >group_list, next) {
> - if (g->iommu_group != iommu_group)
> - continue;
> -
> + if (find_iommu_group(d, iommu_group)) {
>   mutex_unlock(>lock);
>   return -EINVAL;
>   }
> @@ -879,28 +889,28 @@ static void vfio_iommu_type1_detach_group(void 
> *iommu_data,
>  
>   mutex_lock(>lock);
>  
> +
>   list_for_each_entry(domain, >domain_list, next) {
> - list_for_each_entry(group, >group_list, next) {
> - if (group->iommu_group != iommu_group)
> - continue;
> + group = find_iommu_group(domain, iommu_group);
> + if (!group)
> + continue;
>  
> - iommu_detach_group(domain->domain, iommu_group);
> - list_del(>next);
> - kfree(group);
> - /*
> -  * Group ownership provides privilege, if the group
> -  * list is empty, the domain goes away.  If it's the
> -  * last domain, then all the mappings go away too.
> -  */
> - if (list_empty(>group_list)) {
> - if (list_is_singular(>domain_list))
> - vfio_iommu_unmap_unpin_all(iommu);
> - iommu_domain_free(domain->domain);
> - list_del(>next);
> - kfree(domain);
> - }
> - goto done;
> + iommu_detach_group(domain->domain, iommu_group);
> + list_del(>next);
> + kfree(group);
> + /*
> +  * Group ownership provides privilege, if the group
> +  * list is empty, the domain goes away.  If it's the
> +  * last domain, then all the mappings go away too.
> +  */
> + if (list_empty(>group_list)) {
> +     if (list_is_singular(>domain_list))
> + vfio_iommu_unmap_unpin_all(iommu);
> + iommu_domain_free(domain->domain);
> + list_del(>next);
> + kfree(domain);
>   }
> + goto done;
>   }
>  
>  done:
> 

Reviewed-by: Jike Song <jike.s...@intel.com>

--
Thanks,
Jike

Re: [PATCH v10 08/19] vfio iommu type1: Add find_iommu_group() function

2016-11-02 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> Add find_iommu_group()
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: I9d372f1ebe9eb01a5a21374b8a2b03f7df73601f
> ---
>  drivers/vfio/vfio_iommu_type1.c | 58 
> -
>  1 file changed, 34 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 3d916b965492..861ac2a1b0c3 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -745,11 +745,24 @@ static void vfio_test_domain_fgsp(struct vfio_domain 
> *domain)
>   __free_pages(pages, order);
>  }
>  
> +static struct vfio_group *find_iommu_group(struct vfio_domain *domain,
> +struct iommu_group *iommu_group)
> +{
> + struct vfio_group *g;
> +
> + list_for_each_entry(g, >group_list, next) {
> + if (g->iommu_group == iommu_group)
> + return g;
> + }
> +
> + return NULL;
> +}
> +
>  static int vfio_iommu_type1_attach_group(void *iommu_data,
>struct iommu_group *iommu_group)
>  {
>   struct vfio_iommu *iommu = iommu_data;
> - struct vfio_group *group, *g;
> + struct vfio_group *group;
>   struct vfio_domain *domain, *d;
>   struct bus_type *bus = NULL;
>   int ret;
> @@ -757,10 +770,7 @@ static int vfio_iommu_type1_attach_group(void 
> *iommu_data,
>   mutex_lock(>lock);
>  
>   list_for_each_entry(d, >domain_list, next) {
> - list_for_each_entry(g, >group_list, next) {
> - if (g->iommu_group != iommu_group)
> - continue;
> -
> + if (find_iommu_group(d, iommu_group)) {
>   mutex_unlock(>lock);
>   return -EINVAL;
>   }
> @@ -879,28 +889,28 @@ static void vfio_iommu_type1_detach_group(void 
> *iommu_data,
>  
>   mutex_lock(>lock);
>  
> +
>   list_for_each_entry(domain, >domain_list, next) {
> - list_for_each_entry(group, >group_list, next) {
> - if (group->iommu_group != iommu_group)
> - continue;
> + group = find_iommu_group(domain, iommu_group);
> + if (!group)
> + continue;
>  
> - iommu_detach_group(domain->domain, iommu_group);
> - list_del(>next);
> - kfree(group);
> - /*
> -  * Group ownership provides privilege, if the group
> -  * list is empty, the domain goes away.  If it's the
> -  * last domain, then all the mappings go away too.
> -  */
> - if (list_empty(>group_list)) {
> - if (list_is_singular(>domain_list))
> - vfio_iommu_unmap_unpin_all(iommu);
> - iommu_domain_free(domain->domain);
> - list_del(>next);
> - kfree(domain);
> - }
> - goto done;
> + iommu_detach_group(domain->domain, iommu_group);
> + list_del(>next);
> + kfree(group);
> + /*
> +  * Group ownership provides privilege, if the group
> +  * list is empty, the domain goes away.  If it's the
> +  * last domain, then all the mappings go away too.
> +  */
> + if (list_empty(>group_list)) {
> +     if (list_is_singular(>domain_list))
> + vfio_iommu_unmap_unpin_all(iommu);
> + iommu_domain_free(domain->domain);
> + list_del(>next);
> + kfree(domain);
>   }
> + goto done;
>   }
>  
>  done:
> 

Reviewed-by: Jike Song 

--
Thanks,
Jike

Re: [Qemu-devel] [PATCH v9 04/12] vfio iommu: Add support for mediated devices

2016-11-02 Thread Jike Song

On 11/02/2016 09:18 PM, Kirti Wankhede wrote:
> On 11/2/2016 6:30 PM, Jike Song wrote:
>> On 11/02/2016 08:41 PM, Kirti Wankhede wrote:
>>> On 11/2/2016 5:51 PM, Jike Song wrote:
>>>> On 11/02/2016 12:09 PM, Alexey Kardashevskiy wrote:
>>>>> Or you could just reference and use @mm as KVM and others do. Or there is
>>>>> anything else you need from @current than just @mm?
>>>>>
>>>>
>>>> I agree. If @mm is the only thing needed, there is really no reason to
>>>> refer to the @task :-)
>>>>
>>>
>>> In vfio_lock_acct(), that is for page accounting, if mm->mmap_sem is
>>> already held then page accounting is deferred, where task structure is
>>> used to get mm and work is deferred only if mm exist:
>>> mm = get_task_mm(task);
>>>
>>> That is where this module need task structure.
>>
>> Kirti,
>>
>> By calling get_task_mm you hold a ref on @mm and save it in iommu,
>> whenever you want to do something like vfio_lock_acct(), use that mm
>> (as you said, if mmap_sem not accessible then defer it to a work, but
>> still @mm is the whole information), and put it after the usage.
>>
>> I still can't see any reason that the @task have to be saved. It's
>> always the @mm all the time. Did I miss anything?
>>
> 
> If the process is terminated by SIGKILL, as Alexey mentioned in this
> mail thread earlier exit_mm() is called first and then all files are
> closed. From exit_mm(), task->mm is set to NULL. So from teardown path,
> we should call get_task_mm(task) to get current status intsead of using
> stale pointer.

You have got the ref on a task->mm and stored it somewhere, then after
that at some time the task->mm was set to NULL -- what's exactly the
problem here? It's perfectly okay per my understanding ...

--
Thanks,
Jike

Re: [Qemu-devel] [PATCH v9 04/12] vfio iommu: Add support for mediated devices

2016-11-02 Thread Jike Song

On 11/02/2016 09:18 PM, Kirti Wankhede wrote:
> On 11/2/2016 6:30 PM, Jike Song wrote:
>> On 11/02/2016 08:41 PM, Kirti Wankhede wrote:
>>> On 11/2/2016 5:51 PM, Jike Song wrote:
>>>> On 11/02/2016 12:09 PM, Alexey Kardashevskiy wrote:
>>>>> Or you could just reference and use @mm as KVM and others do. Or there is
>>>>> anything else you need from @current than just @mm?
>>>>>
>>>>
>>>> I agree. If @mm is the only thing needed, there is really no reason to
>>>> refer to the @task :-)
>>>>
>>>
>>> In vfio_lock_acct(), that is for page accounting, if mm->mmap_sem is
>>> already held then page accounting is deferred, where task structure is
>>> used to get mm and work is deferred only if mm exist:
>>> mm = get_task_mm(task);
>>>
>>> That is where this module need task structure.
>>
>> Kirti,
>>
>> By calling get_task_mm you hold a ref on @mm and save it in iommu,
>> whenever you want to do something like vfio_lock_acct(), use that mm
>> (as you said, if mmap_sem not accessible then defer it to a work, but
>> still @mm is the whole information), and put it after the usage.
>>
>> I still can't see any reason that the @task have to be saved. It's
>> always the @mm all the time. Did I miss anything?
>>
> 
> If the process is terminated by SIGKILL, as Alexey mentioned in this
> mail thread earlier exit_mm() is called first and then all files are
> closed. From exit_mm(), task->mm is set to NULL. So from teardown path,
> we should call get_task_mm(task) to get current status intsead of using
> stale pointer.

You have got the ref on a task->mm and stored it somewhere, then after
that at some time the task->mm was set to NULL -- what's exactly the
problem here? It's perfectly okay per my understanding ...

--
Thanks,
Jike

Re: [PATCH v10 09/19] vfio iommu type1: Add support for mediated devices

2016-11-02 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
> Mediated device only uses IOMMU APIs, the underlying hardware can be
> managed by an IOMMU domain.
> 
> Aim of this change is:
> - To use most of the code of TYPE1 IOMMU driver for mediated devices
> - To support direct assigned device and mediated device in single module
> 
> This change adds pin and unpin support for mediated device to TYPE1 IOMMU
> backend module. More details:
> - When iommu_group of mediated devices is attached, task structure is
>   cached which is used later to pin pages and page accounting.
> - It keeps track of pinned pages for mediated domain. This data is used to
>   verify unpinning request and to unpin remaining pages while detaching, if
>   there are any.
> - Used existing mechanism for page accounting. If iommu capable domain
>   exist in the container then all pages are already pinned and accounted.
>   Accouting for mdev device is only done if there is no iommu capable
>   domain in the container.
> - Page accouting is updated on hot plug and unplug mdev device and pass
>   through device.
> 
> Tested by assigning below combinations of devices to a single VM:
> - GPU pass through only
> - vGPU device only
> - One GPU pass through and one vGPU device
> - Linux VM hot plug and unplug vGPU device while GPU pass through device
>   exist
> - Linux VM hot plug and unplug GPU pass through device while vGPU device
>   exist
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: I295d6f0f2e0579b8d9882bfd8fd5a4194b97bd9a
> ---
>  drivers/vfio/vfio_iommu_type1.c | 646 
> +++-
>  1 file changed, 571 insertions(+), 75 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 861ac2a1b0c3..5add11a147e1 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -36,6 +36,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson "
> @@ -55,18 +56,26 @@ MODULE_PARM_DESC(disable_hugepages,
>  
>  struct vfio_iommu {
>   struct list_headdomain_list;
> + struct vfio_domain  *external_domain; /* domain for external user */
>   struct mutexlock;
>   struct rb_root  dma_list;
>   boolv2;
>   boolnesting;
>  };
>  
> +struct external_addr_space {
> + struct task_struct  *task;
> + struct rb_root  pfn_list;   /* pinned Host pfn list */
> + struct mutexpfn_list_lock;  /* mutex for pfn_list */
> +};
> +
>  struct vfio_domain {
> - struct iommu_domain *domain;
> - struct list_headnext;
> - struct list_headgroup_list;
> - int prot;   /* IOMMU_CACHE */
> - boolfgsp;   /* Fine-grained super pages */
> + struct iommu_domain *domain;
> + struct list_headnext;
> + struct list_headgroup_list;
> + struct external_addr_space  *external_addr_space;
> + int prot;   /* IOMMU_CACHE */
> + boolfgsp;   /* Fine-grained super pages */
>  };
>  
>  struct vfio_dma {
> @@ -75,6 +84,7 @@ struct vfio_dma {
>   unsigned long   vaddr;  /* Process virtual addr */
>   size_t  size;   /* Map size (bytes) */
>   int prot;   /* IOMMU_READ/WRITE */
> + booliommu_mapped;
>  };
>  
>  struct vfio_group {
> @@ -83,6 +93,21 @@ struct vfio_group {
>  };
>  
>  /*
> + * Guest RAM pinning working set or DMA target
> + */
> +struct vfio_pfn {
> + struct rb_node  node;
> + unsigned long   vaddr;  /* virtual addr */
> + dma_addr_t  iova;   /* IOVA */
> + unsigned long   pfn;/* Host pfn */
> + int prot;
> + atomic_tref_count;
> +};
> +
> +#define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
> + (!list_empty(>domain_list))
> +
> +/*
>   * This code handles mapping and unmapping of user data buffers
>   * into DMA'ble space using the IOMMU
>   */
> @@ -130,6 +155,101 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
> struct vfio_dma *old)
>   rb_erase(>node, >dma_list);
>  }
>  
> +/*
> + * Helper Functions for host pfn list
> + */
> +
> +static struct vfio_pfn *vfio_find_pfn(struct vfio_domain *domain,
> +   unsigned long pfn)
> +{
> + struct rb_node *node;
> + struct vfio_pfn *vpfn;
> +
> + node = domain->external_addr_space->pfn_list.rb_node;
> +
> + while (node) {

Re: [PATCH v10 09/19] vfio iommu type1: Add support for mediated devices

2016-11-02 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
> Mediated device only uses IOMMU APIs, the underlying hardware can be
> managed by an IOMMU domain.
> 
> Aim of this change is:
> - To use most of the code of TYPE1 IOMMU driver for mediated devices
> - To support direct assigned device and mediated device in single module
> 
> This change adds pin and unpin support for mediated device to TYPE1 IOMMU
> backend module. More details:
> - When iommu_group of mediated devices is attached, task structure is
>   cached which is used later to pin pages and page accounting.
> - It keeps track of pinned pages for mediated domain. This data is used to
>   verify unpinning request and to unpin remaining pages while detaching, if
>   there are any.
> - Used existing mechanism for page accounting. If iommu capable domain
>   exist in the container then all pages are already pinned and accounted.
>   Accouting for mdev device is only done if there is no iommu capable
>   domain in the container.
> - Page accouting is updated on hot plug and unplug mdev device and pass
>   through device.
> 
> Tested by assigning below combinations of devices to a single VM:
> - GPU pass through only
> - vGPU device only
> - One GPU pass through and one vGPU device
> - Linux VM hot plug and unplug vGPU device while GPU pass through device
>   exist
> - Linux VM hot plug and unplug GPU pass through device while vGPU device
>   exist
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: I295d6f0f2e0579b8d9882bfd8fd5a4194b97bd9a
> ---
>  drivers/vfio/vfio_iommu_type1.c | 646 
> +++-
>  1 file changed, 571 insertions(+), 75 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 861ac2a1b0c3..5add11a147e1 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -36,6 +36,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson "
> @@ -55,18 +56,26 @@ MODULE_PARM_DESC(disable_hugepages,
>  
>  struct vfio_iommu {
>   struct list_headdomain_list;
> + struct vfio_domain  *external_domain; /* domain for external user */
>   struct mutexlock;
>   struct rb_root  dma_list;
>   boolv2;
>   boolnesting;
>  };
>  
> +struct external_addr_space {
> + struct task_struct  *task;
> + struct rb_root  pfn_list;   /* pinned Host pfn list */
> + struct mutexpfn_list_lock;  /* mutex for pfn_list */
> +};
> +
>  struct vfio_domain {
> - struct iommu_domain *domain;
> - struct list_headnext;
> - struct list_headgroup_list;
> - int prot;   /* IOMMU_CACHE */
> - boolfgsp;   /* Fine-grained super pages */
> + struct iommu_domain *domain;
> + struct list_headnext;
> + struct list_headgroup_list;
> + struct external_addr_space  *external_addr_space;
> + int prot;   /* IOMMU_CACHE */
> + boolfgsp;   /* Fine-grained super pages */
>  };
>  
>  struct vfio_dma {
> @@ -75,6 +84,7 @@ struct vfio_dma {
>   unsigned long   vaddr;  /* Process virtual addr */
>   size_t  size;   /* Map size (bytes) */
>   int prot;   /* IOMMU_READ/WRITE */
> + booliommu_mapped;
>  };
>  
>  struct vfio_group {
> @@ -83,6 +93,21 @@ struct vfio_group {
>  };
>  
>  /*
> + * Guest RAM pinning working set or DMA target
> + */
> +struct vfio_pfn {
> + struct rb_node  node;
> + unsigned long   vaddr;  /* virtual addr */
> + dma_addr_t  iova;   /* IOVA */
> + unsigned long   pfn;/* Host pfn */
> + int prot;
> + atomic_tref_count;
> +};
> +
> +#define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
> + (!list_empty(>domain_list))
> +
> +/*
>   * This code handles mapping and unmapping of user data buffers
>   * into DMA'ble space using the IOMMU
>   */
> @@ -130,6 +155,101 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
> struct vfio_dma *old)
>   rb_erase(>node, >dma_list);
>  }
>  
> +/*
> + * Helper Functions for host pfn list
> + */
> +
> +static struct vfio_pfn *vfio_find_pfn(struct vfio_domain *domain,
> +   unsigned long pfn)
> +{
> + struct rb_node *node;
> + struct vfio_pfn *vpfn;
> +
> + node = domain->external_addr_space->pfn_list.rb_node;
> +
> + while (node) {
> + vpfn = rb_entry(node, struct vfio_pfn, node);
> +
>

Re: [Qemu-devel] [PATCH v9 04/12] vfio iommu: Add support for mediated devices

2016-11-02 Thread Jike Song

On 11/02/2016 08:41 PM, Kirti Wankhede wrote:
> On 11/2/2016 5:51 PM, Jike Song wrote:
>> On 11/02/2016 12:09 PM, Alexey Kardashevskiy wrote:
>>> Or you could just reference and use @mm as KVM and others do. Or there is
>>> anything else you need from @current than just @mm?
>>>
>>
>> I agree. If @mm is the only thing needed, there is really no reason to
>> refer to the @task :-)
>>
> 
> In vfio_lock_acct(), that is for page accounting, if mm->mmap_sem is
> already held then page accounting is deferred, where task structure is
> used to get mm and work is deferred only if mm exist:
>   mm = get_task_mm(task);
> 
> That is where this module need task structure.

Kirti,

By calling get_task_mm you hold a ref on @mm and save it in iommu,
whenever you want to do something like vfio_lock_acct(), use that mm
(as you said, if mmap_sem not accessible then defer it to a work, but
still @mm is the whole information), and put it after the usage.

I still can't see any reason that the @task have to be saved. It's
always the @mm all the time. Did I miss anything?

--
Thanks,
Jike

Re: [Qemu-devel] [PATCH v9 04/12] vfio iommu: Add support for mediated devices

2016-11-02 Thread Jike Song

On 11/02/2016 08:41 PM, Kirti Wankhede wrote:
> On 11/2/2016 5:51 PM, Jike Song wrote:
>> On 11/02/2016 12:09 PM, Alexey Kardashevskiy wrote:
>>> Or you could just reference and use @mm as KVM and others do. Or there is
>>> anything else you need from @current than just @mm?
>>>
>>
>> I agree. If @mm is the only thing needed, there is really no reason to
>> refer to the @task :-)
>>
> 
> In vfio_lock_acct(), that is for page accounting, if mm->mmap_sem is
> already held then page accounting is deferred, where task structure is
> used to get mm and work is deferred only if mm exist:
>   mm = get_task_mm(task);
> 
> That is where this module need task structure.

Kirti,

By calling get_task_mm you hold a ref on @mm and save it in iommu,
whenever you want to do something like vfio_lock_acct(), use that mm
(as you said, if mmap_sem not accessible then defer it to a work, but
still @mm is the whole information), and put it after the usage.

I still can't see any reason that the @task have to be saved. It's
always the @mm all the time. Did I miss anything?

--
Thanks,
Jike

Re: [Qemu-devel] [PATCH v9 04/12] vfio iommu: Add support for mediated devices

2016-11-02 Thread Jike Song

On 11/02/2016 12:09 PM, Alexey Kardashevskiy wrote:
> On 02/11/16 14:29, Kirti Wankhede wrote:
>>
>>
>> On 11/2/2016 6:54 AM, Alexey Kardashevskiy wrote:
>>> On 02/11/16 01:01, Kirti Wankhede wrote:


 On 10/28/2016 7:48 AM, Alexey Kardashevskiy wrote:
> On 27/10/16 23:31, Kirti Wankhede wrote:
>>
>>
>> On 10/27/2016 12:50 PM, Alexey Kardashevskiy wrote:
>>> On 18/10/16 08:22, Kirti Wankhede wrote:
 VFIO IOMMU drivers are designed for the devices which are IOMMU 
 capable.
 Mediated device only uses IOMMU APIs, the underlying hardware can be
 managed by an IOMMU domain.

 Aim of this change is:
 - To use most of the code of TYPE1 IOMMU driver for mediated devices
 - To support direct assigned device and mediated device in single 
 module

 Added two new callback functions to struct vfio_iommu_driver_ops. 
 Backend
 IOMMU module that supports pining and unpinning pages for mdev devices
 should provide these functions.
 Added APIs for pining and unpining pages to VFIO module. These calls 
 back
 into backend iommu module to actually pin and unpin pages.

 This change adds pin and unpin support for mediated device to TYPE1 
 IOMMU
 backend module. More details:
 - When iommu_group of mediated devices is attached, task structure is
   cached which is used later to pin pages and page accounting.
>>>
>>>
>>> For SPAPR TCE IOMMU driver, I ended up caching mm_struct with
>>> atomic_inc(>mm->mm_count) (patches are on the way) instead of
>>> using @current or task as the process might be gone while VFIO 
>>> container is
>>> still alive and @mm might be needed to do proper cleanup; this might 
>>> not be
>>> an issue with this patchset now but still you seem to only use @mm from
>>> task_struct.
>>>
>>
>> Consider the example of QEMU process which creates VFIO container, QEMU
>> in its teardown path would release the container. How could container be
>> alive when process is gone?
>
> do_exit() in kernel/exit.c calls exit_mm() (which sets NULL to tsk->mm)
> first, and then releases open files by calling  exit_files(). So
> container's release() does not have current->mm.
>

 Incrementing usage count (get_task_struct()) while saving task structure
 and decementing it (put_task_struct()) from release() should  work here.
 Updating the patch.
>>>
>>> I cannot see how the task->usage counter prevents do_exit() from performing
>>> the exit, can you?
>>>
>>
>> It will not prevent exit from do_exit(), but that will make sure that we
>> don't have stale pointer of task structure. Then we can check whether
>> the task is alive and get mm pointer in teardown path as below:
> 
> 
> Or you could just reference and use @mm as KVM and others do. Or there is
> anything else you need from @current than just @mm?
> 

I agree. If @mm is the only thing needed, there is really no reason to
refer to the @task :-)

--
Thanks,
Jike

Re: [Qemu-devel] [PATCH v9 04/12] vfio iommu: Add support for mediated devices

2016-11-02 Thread Jike Song

On 11/02/2016 12:09 PM, Alexey Kardashevskiy wrote:
> On 02/11/16 14:29, Kirti Wankhede wrote:
>>
>>
>> On 11/2/2016 6:54 AM, Alexey Kardashevskiy wrote:
>>> On 02/11/16 01:01, Kirti Wankhede wrote:


 On 10/28/2016 7:48 AM, Alexey Kardashevskiy wrote:
> On 27/10/16 23:31, Kirti Wankhede wrote:
>>
>>
>> On 10/27/2016 12:50 PM, Alexey Kardashevskiy wrote:
>>> On 18/10/16 08:22, Kirti Wankhede wrote:
 VFIO IOMMU drivers are designed for the devices which are IOMMU 
 capable.
 Mediated device only uses IOMMU APIs, the underlying hardware can be
 managed by an IOMMU domain.

 Aim of this change is:
 - To use most of the code of TYPE1 IOMMU driver for mediated devices
 - To support direct assigned device and mediated device in single 
 module

 Added two new callback functions to struct vfio_iommu_driver_ops. 
 Backend
 IOMMU module that supports pining and unpinning pages for mdev devices
 should provide these functions.
 Added APIs for pining and unpining pages to VFIO module. These calls 
 back
 into backend iommu module to actually pin and unpin pages.

 This change adds pin and unpin support for mediated device to TYPE1 
 IOMMU
 backend module. More details:
 - When iommu_group of mediated devices is attached, task structure is
   cached which is used later to pin pages and page accounting.
>>>
>>>
>>> For SPAPR TCE IOMMU driver, I ended up caching mm_struct with
>>> atomic_inc(>mm->mm_count) (patches are on the way) instead of
>>> using @current or task as the process might be gone while VFIO 
>>> container is
>>> still alive and @mm might be needed to do proper cleanup; this might 
>>> not be
>>> an issue with this patchset now but still you seem to only use @mm from
>>> task_struct.
>>>
>>
>> Consider the example of QEMU process which creates VFIO container, QEMU
>> in its teardown path would release the container. How could container be
>> alive when process is gone?
>
> do_exit() in kernel/exit.c calls exit_mm() (which sets NULL to tsk->mm)
> first, and then releases open files by calling  exit_files(). So
> container's release() does not have current->mm.
>

 Incrementing usage count (get_task_struct()) while saving task structure
 and decementing it (put_task_struct()) from release() should  work here.
 Updating the patch.
>>>
>>> I cannot see how the task->usage counter prevents do_exit() from performing
>>> the exit, can you?
>>>
>>
>> It will not prevent exit from do_exit(), but that will make sure that we
>> don't have stale pointer of task structure. Then we can check whether
>> the task is alive and get mm pointer in teardown path as below:
> 
> 
> Or you could just reference and use @mm as KVM and others do. Or there is
> anything else you need from @current than just @mm?
> 

I agree. If @mm is the only thing needed, there is really no reason to
refer to the @task :-)

--
Thanks,
Jike

Re: [PATCH v10 04/19] vfio: Common function to increment container_users

2016-11-02 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> This change rearrange functions to have common function to increment
> container_users
> 
> Signed-off-by: Kirti Wankhede <kwankh...@nvidia.com>
> Signed-off-by: Neo Jia <c...@nvidia.com>
> Change-Id: I8bdeb352bc8439b107ffd519480fd4dc238677f2
> ---
>  drivers/vfio/vfio.c | 34 +-
>  1 file changed, 21 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 23bc86c1d05d..2e83bdf007fe 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1385,6 +1385,23 @@ static bool vfio_group_viable(struct vfio_group *group)
>group, vfio_dev_viable) == 0);
>  }
>  
> +static int vfio_group_add_container_user(struct vfio_group *group)
> +{
> + if (!atomic_inc_not_zero(>container_users))
> + return -EINVAL;
> +
> + if (group->noiommu) {
> + atomic_dec(>container_users);
> + return -EPERM;
> + }

trivial: a blank line here

> + if (!group->container->iommu_driver || !vfio_group_viable(group)) {
> + atomic_dec(>container_users);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
>  static const struct file_operations vfio_device_fops;
>  
>  static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
> @@ -1694,23 +1711,14 @@ static const struct file_operations vfio_device_fops 
> = {
>  struct vfio_group *vfio_group_get_external_user(struct file *filep)
>  {
>   struct vfio_group *group = filep->private_data;
> + int ret;
>  
>   if (filep->f_op != _group_fops)
>   return ERR_PTR(-EINVAL);
>  
> - if (!atomic_inc_not_zero(>container_users))
> - return ERR_PTR(-EINVAL);
> -
> - if (group->noiommu) {
> - atomic_dec(>container_users);
> - return ERR_PTR(-EPERM);
> - }
> -
> - if (!group->container->iommu_driver ||
> - !vfio_group_viable(group)) {
> - atomic_dec(>container_users);
> - return ERR_PTR(-EINVAL);
> - }
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + return ERR_PTR(ret);
>  
>   vfio_group_get(group);
>  

Reviewed-by: Jike Song <jike.s...@intel.com>

--
Thanks,
Jike

Re: [PATCH v10 04/19] vfio: Common function to increment container_users

2016-11-02 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> This change rearrange functions to have common function to increment
> container_users
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: I8bdeb352bc8439b107ffd519480fd4dc238677f2
> ---
>  drivers/vfio/vfio.c | 34 +-
>  1 file changed, 21 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 23bc86c1d05d..2e83bdf007fe 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1385,6 +1385,23 @@ static bool vfio_group_viable(struct vfio_group *group)
>group, vfio_dev_viable) == 0);
>  }
>  
> +static int vfio_group_add_container_user(struct vfio_group *group)
> +{
> + if (!atomic_inc_not_zero(>container_users))
> + return -EINVAL;
> +
> + if (group->noiommu) {
> + atomic_dec(>container_users);
> + return -EPERM;
> + }

trivial: a blank line here

> + if (!group->container->iommu_driver || !vfio_group_viable(group)) {
> + atomic_dec(>container_users);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
>  static const struct file_operations vfio_device_fops;
>  
>  static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
> @@ -1694,23 +1711,14 @@ static const struct file_operations vfio_device_fops 
> = {
>  struct vfio_group *vfio_group_get_external_user(struct file *filep)
>  {
>   struct vfio_group *group = filep->private_data;
> + int ret;
>  
>   if (filep->f_op != _group_fops)
>   return ERR_PTR(-EINVAL);
>  
> - if (!atomic_inc_not_zero(>container_users))
> - return ERR_PTR(-EINVAL);
> -
> - if (group->noiommu) {
> - atomic_dec(>container_users);
> - return ERR_PTR(-EPERM);
> - }
> -
> - if (!group->container->iommu_driver ||
> - !vfio_group_viable(group)) {
> - atomic_dec(>container_users);
> - return ERR_PTR(-EINVAL);
> - }
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + return ERR_PTR(ret);
>  
>   vfio_group_get(group);
>  

Reviewed-by: Jike Song 

--
Thanks,
Jike

Re: [PATCH v10 03/19] vfio: Rearrange functions to get vfio_group from dev

2016-11-02 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> This patch rearranges functions to get vfio_group from device
> 
> Signed-off-by: Kirti Wankhede <kwankh...@nvidia.com>
> Signed-off-by: Neo Jia <c...@nvidia.com>
> Change-Id: I1f93262bdbab75094bc24b087b29da35ba70c4c6
> ---
>  drivers/vfio/vfio.c | 23 ---
>  1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index d1d70e0b011b..23bc86c1d05d 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -480,6 +480,21 @@ static struct vfio_group *vfio_group_get_from_minor(int 
> minor)
>   return group;
>  }
>  
> +static struct vfio_group *vfio_group_get_from_dev(struct device *dev)
> +{
> + struct iommu_group *iommu_group;
> + struct vfio_group *group;
> +
> + iommu_group = iommu_group_get(dev);
> + if (!iommu_group)
> + return NULL;
> +
> + group = vfio_group_get_from_iommu(iommu_group);
> + iommu_group_put(iommu_group);
> +
> + return group;
> +}
> +
>  /**
>   * Device objects - create, release, get, put, search
>   */
> @@ -811,16 +826,10 @@ EXPORT_SYMBOL_GPL(vfio_add_group_dev);
>   */
>  struct vfio_device *vfio_device_get_from_dev(struct device *dev)
>  {
> - struct iommu_group *iommu_group;
>   struct vfio_group *group;
>   struct vfio_device *device;
>  
> - iommu_group = iommu_group_get(dev);
> - if (!iommu_group)
> - return NULL;
> -
> - group = vfio_group_get_from_iommu(iommu_group);
> - iommu_group_put(iommu_group);
> + group = vfio_group_get_from_dev(dev);
>   if (!group)
>   return NULL;
>  

Reviewed-by: Jike Song <jike.s...@intel.com>

--
Thanks,
Jike

Re: [PATCH v10 03/19] vfio: Rearrange functions to get vfio_group from dev

2016-11-02 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> This patch rearranges functions to get vfio_group from device
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: I1f93262bdbab75094bc24b087b29da35ba70c4c6
> ---
>  drivers/vfio/vfio.c | 23 ---
>  1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index d1d70e0b011b..23bc86c1d05d 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -480,6 +480,21 @@ static struct vfio_group *vfio_group_get_from_minor(int 
> minor)
>   return group;
>  }
>  
> +static struct vfio_group *vfio_group_get_from_dev(struct device *dev)
> +{
> + struct iommu_group *iommu_group;
> + struct vfio_group *group;
> +
> + iommu_group = iommu_group_get(dev);
> + if (!iommu_group)
> + return NULL;
> +
> + group = vfio_group_get_from_iommu(iommu_group);
> + iommu_group_put(iommu_group);
> +
> + return group;
> +}
> +
>  /**
>   * Device objects - create, release, get, put, search
>   */
> @@ -811,16 +826,10 @@ EXPORT_SYMBOL_GPL(vfio_add_group_dev);
>   */
>  struct vfio_device *vfio_device_get_from_dev(struct device *dev)
>  {
> - struct iommu_group *iommu_group;
>   struct vfio_group *group;
>   struct vfio_device *device;
>  
> - iommu_group = iommu_group_get(dev);
> - if (!iommu_group)
> - return NULL;
> -
> - group = vfio_group_get_from_iommu(iommu_group);
> -     iommu_group_put(iommu_group);
> + group = vfio_group_get_from_dev(dev);
>   if (!group)
>   return NULL;
>  

Reviewed-by: Jike Song 

--
Thanks,
Jike

Re: [PATCH v10 02/19] vfio: VFIO based driver for Mediated devices

2016-11-02 Thread Jike Song

ent->ops->write))
> + return -EINVAL;
> +
> + return parent->ops->write(mdev, buf, count, ppos);
> +}
> +
> +static int vfio_mdev_mmap(void *device_data, struct vm_area_struct *vma)
> +{
> + struct mdev_device *mdev = device_data;
> + struct parent_device *parent = mdev->parent;
> +
> + if (unlikely(!parent->ops->mmap))
> + return -EINVAL;
> +
> + return parent->ops->mmap(mdev, vma);
> +}
> +
> +static const struct vfio_device_ops vfio_mdev_dev_ops = {
> + .name   = "vfio-mdev",
> + .open   = vfio_mdev_open,
> + .release= vfio_mdev_release,
> + .ioctl  = vfio_mdev_unlocked_ioctl,
> + .read   = vfio_mdev_read,
> + .write  = vfio_mdev_write,
> + .mmap   = vfio_mdev_mmap,
> +};
> +
> +int vfio_mdev_probe(struct device *dev)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + return vfio_add_group_dev(dev, _mdev_dev_ops, mdev);
> +}
> +
> +void vfio_mdev_remove(struct device *dev)
> +{
> + vfio_del_group_dev(dev);
> +}
> +
> +struct mdev_driver vfio_mdev_driver = {
> + .name   = "vfio_mdev",
> + .probe  = vfio_mdev_probe,
> + .remove = vfio_mdev_remove,
> +};
> +
> +static int __init vfio_mdev_init(void)
> +{
> + return mdev_register_driver(_mdev_driver, THIS_MODULE);
> +}
> +
> +static void __exit vfio_mdev_exit(void)
> +{
> + mdev_unregister_driver(_mdev_driver);
> +}
> +
> +module_init(vfio_mdev_init)
> +module_exit(vfio_mdev_exit)
> +
> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
>

Reviewed-by: Jike Song <jike.s...@intel.com>

--
Thanks,
Jike

Re: [PATCH v10 02/19] vfio: VFIO based driver for Mediated devices

2016-11-02 Thread Jike Song

v, buf, count, ppos);
> +}
> +
> +static int vfio_mdev_mmap(void *device_data, struct vm_area_struct *vma)
> +{
> + struct mdev_device *mdev = device_data;
> + struct parent_device *parent = mdev->parent;
> +
> + if (unlikely(!parent->ops->mmap))
> + return -EINVAL;
> +
> + return parent->ops->mmap(mdev, vma);
> +}
> +
> +static const struct vfio_device_ops vfio_mdev_dev_ops = {
> + .name   = "vfio-mdev",
> + .open   = vfio_mdev_open,
> + .release= vfio_mdev_release,
> + .ioctl  = vfio_mdev_unlocked_ioctl,
> + .read   = vfio_mdev_read,
> + .write  = vfio_mdev_write,
> + .mmap   = vfio_mdev_mmap,
> +};
> +
> +int vfio_mdev_probe(struct device *dev)
> +{
> + struct mdev_device *mdev = to_mdev_device(dev);
> +
> + return vfio_add_group_dev(dev, _mdev_dev_ops, mdev);
> +}
> +
> +void vfio_mdev_remove(struct device *dev)
> +{
> + vfio_del_group_dev(dev);
> +}
> +
> +struct mdev_driver vfio_mdev_driver = {
> + .name   = "vfio_mdev",
> + .probe  = vfio_mdev_probe,
> + .remove = vfio_mdev_remove,
> +};
> +
> +static int __init vfio_mdev_init(void)
> +{
> + return mdev_register_driver(_mdev_driver, THIS_MODULE);
> +}
> +
> +static void __exit vfio_mdev_exit(void)
> +{
> + mdev_unregister_driver(_mdev_driver);
> +}
> +
> +module_init(vfio_mdev_init)
> +module_exit(vfio_mdev_exit)
> +
> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
>

Reviewed-by: Jike Song 

--
Thanks,
Jike

Re: [Qemu-devel] [PATCH v10 01/19] vfio: Mediated device Core driver

2016-11-02 Thread Jike Song

On 11/02/2016 03:59 PM, Kirti Wankhede wrote:
> On 10/29/2016 11:41 PM, Jike Song wrote:
>> On 10/29/2016 06:06 PM, Kirti Wankhede wrote:
>>>
>>>
>>> On 10/29/2016 10:00 AM, Jike Song wrote:
>>>> On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
>>>>> +int mdev_register_device(struct device *dev, const struct parent_ops 
>>>>> *ops)
>>>>> +{
>>>>> + int ret;
>>>>> + struct parent_device *parent;
>>>>> +
>>>>> + /* check for mandatory ops */
>>>>> + if (!ops || !ops->create || !ops->remove || !ops->supported_type_groups)
>>>>> + return -EINVAL;
>>>>> +
>>>>> + dev = get_device(dev);
>>>>> + if (!dev)
>>>>> + return -EINVAL;
>>>>> +
>>>>> + mutex_lock(_list_lock);
>>>>> +
>>>>> + /* Check for duplicate */
>>>>> + parent = __find_parent_device(dev);
>>>>> + if (parent) {
>>>>> + ret = -EEXIST;
>>>>> + goto add_dev_err;
>>>>> + }
>>>>> +
>>>>> + parent = kzalloc(sizeof(*parent), GFP_KERNEL);
>>>>> + if (!parent) {
>>>>> + ret = -ENOMEM;
>>>>> + goto add_dev_err;
>>>>> + }
>>>>> +
>>>>> + kref_init(>ref);
>>>>> + mutex_init(>lock);
>>>>> +
>>>>> + parent->dev = dev;
>>>>> + parent->ops = ops;
>>>>> +
>>>>> + ret = parent_create_sysfs_files(parent);
>>>>> + if (ret) {
>>>>> + mutex_unlock(_list_lock);
>>>>> + mdev_put_parent(parent);
>>>>> + return ret;
>>>>> + }
>>>>> +
>>>>> + ret = class_compat_create_link(mdev_bus_compat_class, dev, NULL);
>>>>> + if (ret)
>>>>> + dev_warn(dev, "Failed to create compatibility class link\n");
>>>>> +
>>>>
>>>> Hi Kirti,
>>>>
>>>> Like I replied to previous version:
>>>>
>>>>http://www.spinics.net/lists/kvm/msg139331.html
>>>>
>>>
>>> Hi Jike,
>>>
>>> I saw your reply but by that time v10 version of patch series was out
>>> for review.
>>>
>>
>> Ah..yes, I forgot that :)
>>
>>>> You can always check if mdev_bus_compat_class already registered
>>>> here, and register it if not yet. Same logic should be adopted to
>>>> mdev_init.
>>>>
>>>> Current implementation will simply panic if configured as builtin,
>>>> which is rare but far from impossible.
>>>>
>>>
>>> Can you verify attached patch with v10 patch-set whether this works for you?
>>> I'll incorporate this change in my next version.
>>>
>>
>> Seems cool. But would you please also keep the register in mdev_init(),
>> just check the 'in case it was already registered' case? Thanks!
>>
> 
> The class is used only to keep symbolic to the devices which are
> registered to mdev framework. So I think its ok to register this class
> when first device is registered.
> 

That's also cool :-)

So if you like it:

Reviewed-by: Jike Song <jike.s...@intel.com>


--
Thanks,
Jike

Re: [Qemu-devel] [PATCH v10 01/19] vfio: Mediated device Core driver

2016-11-02 Thread Jike Song

On 11/02/2016 03:59 PM, Kirti Wankhede wrote:
> On 10/29/2016 11:41 PM, Jike Song wrote:
>> On 10/29/2016 06:06 PM, Kirti Wankhede wrote:
>>>
>>>
>>> On 10/29/2016 10:00 AM, Jike Song wrote:
>>>> On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
>>>>> +int mdev_register_device(struct device *dev, const struct parent_ops 
>>>>> *ops)
>>>>> +{
>>>>> + int ret;
>>>>> + struct parent_device *parent;
>>>>> +
>>>>> + /* check for mandatory ops */
>>>>> + if (!ops || !ops->create || !ops->remove || !ops->supported_type_groups)
>>>>> + return -EINVAL;
>>>>> +
>>>>> + dev = get_device(dev);
>>>>> + if (!dev)
>>>>> + return -EINVAL;
>>>>> +
>>>>> + mutex_lock(_list_lock);
>>>>> +
>>>>> + /* Check for duplicate */
>>>>> + parent = __find_parent_device(dev);
>>>>> + if (parent) {
>>>>> + ret = -EEXIST;
>>>>> + goto add_dev_err;
>>>>> + }
>>>>> +
>>>>> + parent = kzalloc(sizeof(*parent), GFP_KERNEL);
>>>>> + if (!parent) {
>>>>> + ret = -ENOMEM;
>>>>> + goto add_dev_err;
>>>>> + }
>>>>> +
>>>>> + kref_init(>ref);
>>>>> + mutex_init(>lock);
>>>>> +
>>>>> + parent->dev = dev;
>>>>> + parent->ops = ops;
>>>>> +
>>>>> + ret = parent_create_sysfs_files(parent);
>>>>> + if (ret) {
>>>>> + mutex_unlock(_list_lock);
>>>>> + mdev_put_parent(parent);
>>>>> + return ret;
>>>>> + }
>>>>> +
>>>>> + ret = class_compat_create_link(mdev_bus_compat_class, dev, NULL);
>>>>> + if (ret)
>>>>> + dev_warn(dev, "Failed to create compatibility class link\n");
>>>>> +
>>>>
>>>> Hi Kirti,
>>>>
>>>> Like I replied to previous version:
>>>>
>>>>http://www.spinics.net/lists/kvm/msg139331.html
>>>>
>>>
>>> Hi Jike,
>>>
>>> I saw your reply but by that time v10 version of patch series was out
>>> for review.
>>>
>>
>> Ah..yes, I forgot that :)
>>
>>>> You can always check if mdev_bus_compat_class already registered
>>>> here, and register it if not yet. Same logic should be adopted to
>>>> mdev_init.
>>>>
>>>> Current implementation will simply panic if configured as builtin,
>>>> which is rare but far from impossible.
>>>>
>>>
>>> Can you verify attached patch with v10 patch-set whether this works for you?
>>> I'll incorporate this change in my next version.
>>>
>>
>> Seems cool. But would you please also keep the register in mdev_init(),
>> just check the 'in case it was already registered' case? Thanks!
>>
> 
> The class is used only to keep symbolic to the devices which are
> registered to mdev framework. So I think its ok to register this class
> when first device is registered.
> 

That's also cool :-)

So if you like it:

Reviewed-by: Jike Song 


--
Thanks,
Jike

Re: [Qemu-devel] [PATCH v10 00/19] Add Mediated device support

2016-11-01 Thread Jike Song

On 11/01/2016 11:24 PM, Gerd Hoffmann wrote:
>> I rebased KVMGT upon v10, with 2 minor changes:
>>
>>  1, get_user_pages_remote has only 7 args
> 
> Appears to be a 4.9 merge window change.  v10 as-is applies and builds
> fine against 4.8, after rebasing to 4.9-rc3 it stops building due to
> this.
> 
> Can you share the patch?
> 
>>  2, vfio iommu notifier calls vendor callback with iova instead of pfn
> 
> And this one too?
> 
> Also:  github seems to have the v9 kvmgt version still.  Can you push
> the update?

Zhenyu will help to push it to 01org/gvt-linux. Those 2 patches added by
me are trivial, I guess Kirti will have her formal fixes in next version.

> The kvmgt branch apparently depends on alot of unmerged stuff, "git
> describe" says 1669 patches on top of 4.8-rc8.
> 
> Can you outline what this is? Mostly drm-next?
> How much of this landed in the 4.9 merge window?

Yes, mostly drm-next, targeting 4.10 merge window. The overwhelming
part is the device-model under drivers/gpu/drm/i915/gvt/.
 

--
Thanks,
Jike

Re: [Qemu-devel] [PATCH v10 00/19] Add Mediated device support

2016-11-01 Thread Jike Song

On 11/01/2016 11:24 PM, Gerd Hoffmann wrote:
>> I rebased KVMGT upon v10, with 2 minor changes:
>>
>>  1, get_user_pages_remote has only 7 args
> 
> Appears to be a 4.9 merge window change.  v10 as-is applies and builds
> fine against 4.8, after rebasing to 4.9-rc3 it stops building due to
> this.
> 
> Can you share the patch?
> 
>>  2, vfio iommu notifier calls vendor callback with iova instead of pfn
> 
> And this one too?
> 
> Also:  github seems to have the v9 kvmgt version still.  Can you push
> the update?

Zhenyu will help to push it to 01org/gvt-linux. Those 2 patches added by
me are trivial, I guess Kirti will have her formal fixes in next version.

> The kvmgt branch apparently depends on alot of unmerged stuff, "git
> describe" says 1669 patches on top of 4.8-rc8.
> 
> Can you outline what this is? Mostly drm-next?
> How much of this landed in the 4.9 merge window?

Yes, mostly drm-next, targeting 4.10 merge window. The overwhelming
part is the device-model under drivers/gpu/drm/i915/gvt/.
 

--
Thanks,
Jike

Re: [PATCH v10 00/19] Add Mediated device support

2016-11-01 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> This series adds Mediated device support to Linux host kernel. Purpose
> of this series is to provide a common interface for mediated device
> management that can be used by different devices. This series introduces
> Mdev core module that creates and manages mediated devices, VFIO based
> driver for mediated devices that are created by mdev core module and
> update VFIO type1 IOMMU module to support pinning & unpinning for mediated
> devices.
> 
> What changed in v10?
> vfio:
>  - Split commits in multple individual commits.
>  - Removed the function added in v9 to get device_api string.
>  - Defined constant strings in include/uapi/linux/vfio.h that should be used 
> by
>vendor driver for device_api attribute.
> 
> vfio_iommu_type1:
>  - Fixed accounting when pass through device is unplugged while mdev device
>exist in a domain.
>  - Added blocking notifier to notify DMA_UNMAP to vendor driver to invalidate
>mappings.
>  - Exported APIs to register notifier for DMA_UNMAP action.
> 
> Documentation:
>  - Added sysfs ABI for mediated device framework.
>  - Updated Documentation/vfio-mdev/vfio-mediated-device.txt.
>  - Updated mtty.c with bug fixes.
> 
> Kirti Wankhede (19):
>   vfio: Mediated device Core driver
>   vfio: VFIO based driver for Mediated devices
>   vfio: Rearrange functions to get vfio_group from dev
>   vfio: Common function to increment container_users
>   vfio iommu: Added pin and unpin callback functions to
> vfio_iommu_driver_ops
>   vfio iommu type1: Update arguments of vfio_lock_acct
>   vfio iommu type1: Update argument of vaddr_get_pfn()
>   vfio iommu type1: Add find_iommu_group() function
>   vfio iommu type1: Add support for mediated devices
>   vfio iommu: Add blocking notifier to notify DMA_UNMAP
>   vfio: Introduce common function to add capabilities
>   vfio_pci: Update vfio_pci to use vfio_info_add_capability()
>   vfio: Introduce vfio_set_irqs_validate_and_prepare()
>   vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare()
>   vfio_platform: Updated to use vfio_set_irqs_validate_and_prepare()
>   vfio: Define device_api strings
>   docs: Add Documentation for Mediated devices
>   docs: Sysfs ABI for mediated device framework
>   docs: Sample driver to demonstrate how to use Mediated device
> framework.
> 
>  Documentation/ABI/testing/sysfs-bus-vfio-mdev|  111 ++
>  Documentation/vfio-mdev/Makefile |   13 +
>  Documentation/vfio-mdev/mtty.c   | 1503 
> ++
>  Documentation/vfio-mdev/vfio-mediated-device.txt |  398 ++
>  drivers/vfio/Kconfig |1 +
>  drivers/vfio/Makefile|1 +
>  drivers/vfio/mdev/Kconfig|   17 +
>  drivers/vfio/mdev/Makefile   |5 +
>  drivers/vfio/mdev/mdev_core.c|  384 ++
>  drivers/vfio/mdev/mdev_driver.c  |  122 ++
>  drivers/vfio/mdev/mdev_private.h |   41 +
>  drivers/vfio/mdev/mdev_sysfs.c   |  286 
>  drivers/vfio/mdev/vfio_mdev.c|  148 +++
>  drivers/vfio/pci/vfio_pci.c  |   78 +-
>  drivers/vfio/platform/vfio_platform_common.c |   31 +-
>  drivers/vfio/vfio.c  |  322 -
>  drivers/vfio/vfio_iommu_type1.c  |  808 ++--
>  include/linux/mdev.h |  167 +++
>  include/linux/vfio.h |   30 +-
>  include/uapi/linux/vfio.h|   10 +
>  20 files changed, 4270 insertions(+), 206 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-bus-vfio-mdev
>  create mode 100644 Documentation/vfio-mdev/Makefile
>  create mode 100644 Documentation/vfio-mdev/mtty.c
>  create mode 100644 Documentation/vfio-mdev/vfio-mediated-device.txt
>  create mode 100644 drivers/vfio/mdev/Kconfig
>  create mode 100644 drivers/vfio/mdev/Makefile
>  create mode 100644 drivers/vfio/mdev/mdev_core.c
>  create mode 100644 drivers/vfio/mdev/mdev_driver.c
>  create mode 100644 drivers/vfio/mdev/mdev_private.h
>  create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
>  create mode 100644 drivers/vfio/mdev/vfio_mdev.c
>  create mode 100644 include/linux/mdev.h

A side note:

I rebased KVMGT upon v10, with 2 minor changes:

1, get_user_pages_remote has only 7 args
2, vfio iommu notifier calls vendor callback with iova instead of pfn

so far it works pretty well. Thanks!

--
Thanks,
Jike

Re: [PATCH v10 00/19] Add Mediated device support

2016-11-01 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> This series adds Mediated device support to Linux host kernel. Purpose
> of this series is to provide a common interface for mediated device
> management that can be used by different devices. This series introduces
> Mdev core module that creates and manages mediated devices, VFIO based
> driver for mediated devices that are created by mdev core module and
> update VFIO type1 IOMMU module to support pinning & unpinning for mediated
> devices.
> 
> What changed in v10?
> vfio:
>  - Split commits in multple individual commits.
>  - Removed the function added in v9 to get device_api string.
>  - Defined constant strings in include/uapi/linux/vfio.h that should be used 
> by
>vendor driver for device_api attribute.
> 
> vfio_iommu_type1:
>  - Fixed accounting when pass through device is unplugged while mdev device
>exist in a domain.
>  - Added blocking notifier to notify DMA_UNMAP to vendor driver to invalidate
>mappings.
>  - Exported APIs to register notifier for DMA_UNMAP action.
> 
> Documentation:
>  - Added sysfs ABI for mediated device framework.
>  - Updated Documentation/vfio-mdev/vfio-mediated-device.txt.
>  - Updated mtty.c with bug fixes.
> 
> Kirti Wankhede (19):
>   vfio: Mediated device Core driver
>   vfio: VFIO based driver for Mediated devices
>   vfio: Rearrange functions to get vfio_group from dev
>   vfio: Common function to increment container_users
>   vfio iommu: Added pin and unpin callback functions to
> vfio_iommu_driver_ops
>   vfio iommu type1: Update arguments of vfio_lock_acct
>   vfio iommu type1: Update argument of vaddr_get_pfn()
>   vfio iommu type1: Add find_iommu_group() function
>   vfio iommu type1: Add support for mediated devices
>   vfio iommu: Add blocking notifier to notify DMA_UNMAP
>   vfio: Introduce common function to add capabilities
>   vfio_pci: Update vfio_pci to use vfio_info_add_capability()
>   vfio: Introduce vfio_set_irqs_validate_and_prepare()
>   vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare()
>   vfio_platform: Updated to use vfio_set_irqs_validate_and_prepare()
>   vfio: Define device_api strings
>   docs: Add Documentation for Mediated devices
>   docs: Sysfs ABI for mediated device framework
>   docs: Sample driver to demonstrate how to use Mediated device
> framework.
> 
>  Documentation/ABI/testing/sysfs-bus-vfio-mdev|  111 ++
>  Documentation/vfio-mdev/Makefile |   13 +
>  Documentation/vfio-mdev/mtty.c   | 1503 
> ++
>  Documentation/vfio-mdev/vfio-mediated-device.txt |  398 ++
>  drivers/vfio/Kconfig |1 +
>  drivers/vfio/Makefile|1 +
>  drivers/vfio/mdev/Kconfig|   17 +
>  drivers/vfio/mdev/Makefile   |5 +
>  drivers/vfio/mdev/mdev_core.c|  384 ++
>  drivers/vfio/mdev/mdev_driver.c  |  122 ++
>  drivers/vfio/mdev/mdev_private.h |   41 +
>  drivers/vfio/mdev/mdev_sysfs.c   |  286 
>  drivers/vfio/mdev/vfio_mdev.c|  148 +++
>  drivers/vfio/pci/vfio_pci.c  |   78 +-
>  drivers/vfio/platform/vfio_platform_common.c |   31 +-
>  drivers/vfio/vfio.c  |  322 -
>  drivers/vfio/vfio_iommu_type1.c  |  808 ++--
>  include/linux/mdev.h |  167 +++
>  include/linux/vfio.h |   30 +-
>  include/uapi/linux/vfio.h|   10 +
>  20 files changed, 4270 insertions(+), 206 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-bus-vfio-mdev
>  create mode 100644 Documentation/vfio-mdev/Makefile
>  create mode 100644 Documentation/vfio-mdev/mtty.c
>  create mode 100644 Documentation/vfio-mdev/vfio-mediated-device.txt
>  create mode 100644 drivers/vfio/mdev/Kconfig
>  create mode 100644 drivers/vfio/mdev/Makefile
>  create mode 100644 drivers/vfio/mdev/mdev_core.c
>  create mode 100644 drivers/vfio/mdev/mdev_driver.c
>  create mode 100644 drivers/vfio/mdev/mdev_private.h
>  create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
>  create mode 100644 drivers/vfio/mdev/vfio_mdev.c
>  create mode 100644 include/linux/mdev.h

A side note:

I rebased KVMGT upon v10, with 2 minor changes:

1, get_user_pages_remote has only 7 args
2, vfio iommu notifier calls vendor callback with iova instead of pfn

so far it works pretty well. Thanks!

--
Thanks,
Jike

Re: [PATCH v10 05/19] vfio iommu: Added pin and unpin callback functions to vfio_iommu_driver_ops

2016-11-01 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> Added two new callback functions to struct vfio_iommu_driver_ops. Backend
> IOMMU module that supports pining and unpinning pages for mdev devices
> should provide these functions.
> Added APIs for pining and unpining pages to VFIO module. These calls back
> into backend iommu module to actually pin and unpin pages.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: Ia7417723aaae86bec2959ad9ae6c2915ddd340e0
> ---
>  drivers/vfio/vfio.c  | 92 
> 
>  include/linux/vfio.h | 12 ++-
>  2 files changed, 103 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 2e83bdf007fe..28b50ca14c52 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1799,6 +1799,98 @@ void vfio_info_cap_shift(struct vfio_info_cap *caps, 
> size_t offset)
>  }
>  EXPORT_SYMBOL_GPL(vfio_info_cap_shift);
>  
> +
> +/*
> + * Pin a set of guest PFNs and return their associated host PFNs for local
> + * domain only.
> + * @dev [in] : device
> + * @user_pfn [in]: array of user/guest PFNs
> + * @npage [in]: count of array elements
> + * @prot [in] : protection flags
> + * @phys_pfn[out] : array of host PFNs
> + */

Hi Kirti,

Would you also add the documentation what the return value is? It's kind
not clear, and any reason to use long instead of int?

> +long vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
> + long npage, int prot, unsigned long *phys_pfn)
> +{
> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + ssize_t ret;
> +
> + if (!dev || !user_pfn || !phys_pfn)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (IS_ERR(group))
> + return PTR_ERR(group);
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto err_pin_pages;
> +
> + container = group->container;
> + down_read(>group_lock);
> +
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->pin_pages))
> + ret = driver->ops->pin_pages(container->iommu_data, user_pfn,
> +  npage, prot, phys_pfn);
> + else
> + ret = -EINVAL;
> +
> + up_read(>group_lock);
> + vfio_group_try_dissolve_container(group);
> +
> +err_pin_pages:
> + vfio_group_put(group);
> + return ret;
> +
> +}
> +EXPORT_SYMBOL(vfio_pin_pages);
> +
> +/*
> + * Unpin set of host PFNs for local domain only.
> + * @dev [in] : device
> + * @pfn [in] : array of host PFNs to be unpinned.
> + * @npage [in] :count of elements in array, that is number of pages.
> + */

Ditto

--
Thanks,
Jike

> +long vfio_unpin_pages(struct device *dev, unsigned long *pfn, long npage)
> +{
> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + ssize_t ret;
> +
> + if (!dev || !pfn)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (IS_ERR(group))
> + return PTR_ERR(group);
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto err_unpin_pages;
> +
> + container = group->container;
> + down_read(>group_lock);
> +
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->unpin_pages))
> + ret = driver->ops->unpin_pages(container->iommu_data, pfn,
> +npage);
> + else
> + ret = -EINVAL;
> +
> + up_read(>group_lock);
> + vfio_group_try_dissolve_container(group);
> +
> +err_unpin_pages:
> + vfio_group_put(group);
> + return ret;
> +}
> +EXPORT_SYMBOL(vfio_unpin_pages);
> +
>  /**
>   * Module/class support
>   */
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 0ecae0b1cd34..0609a2052846 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -75,7 +75,11 @@ struct vfio_iommu_driver_ops {
>   struct iommu_group *group);
>   void(*detach_group)(void *iommu_data,
>   struct iommu_group *group);
> -
> + long(*pin_pages)(void *iommu_data, unsigned long *user_pfn,
> +  long npage, int prot,
> +  unsigned long *phys_pfn);
> + long(*unpin_pages)(void *iommu_data, unsigned long *pfn,
> +long npage);
>  };
>  
>  extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops 
> *ops);
> @@ -127,6 +131,12 @@ static inline long vfio_spapr_iommu_eeh_ioctl(struct 
> iommu_group *group,
>  }
>  #endif /* CONFIG_EEH */
>  
> +extern long vfio_pin_pages(struct device *dev, unsigned long

Re: [PATCH v10 05/19] vfio iommu: Added pin and unpin callback functions to vfio_iommu_driver_ops

2016-11-01 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> Added two new callback functions to struct vfio_iommu_driver_ops. Backend
> IOMMU module that supports pining and unpinning pages for mdev devices
> should provide these functions.
> Added APIs for pining and unpining pages to VFIO module. These calls back
> into backend iommu module to actually pin and unpin pages.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: Ia7417723aaae86bec2959ad9ae6c2915ddd340e0
> ---
>  drivers/vfio/vfio.c  | 92 
> 
>  include/linux/vfio.h | 12 ++-
>  2 files changed, 103 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 2e83bdf007fe..28b50ca14c52 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1799,6 +1799,98 @@ void vfio_info_cap_shift(struct vfio_info_cap *caps, 
> size_t offset)
>  }
>  EXPORT_SYMBOL_GPL(vfio_info_cap_shift);
>  
> +
> +/*
> + * Pin a set of guest PFNs and return their associated host PFNs for local
> + * domain only.
> + * @dev [in] : device
> + * @user_pfn [in]: array of user/guest PFNs
> + * @npage [in]: count of array elements
> + * @prot [in] : protection flags
> + * @phys_pfn[out] : array of host PFNs
> + */

Hi Kirti,

Would you also add the documentation what the return value is? It's kind
not clear, and any reason to use long instead of int?

> +long vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
> + long npage, int prot, unsigned long *phys_pfn)
> +{
> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + ssize_t ret;
> +
> + if (!dev || !user_pfn || !phys_pfn)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (IS_ERR(group))
> + return PTR_ERR(group);
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto err_pin_pages;
> +
> + container = group->container;
> + down_read(>group_lock);
> +
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->pin_pages))
> + ret = driver->ops->pin_pages(container->iommu_data, user_pfn,
> +  npage, prot, phys_pfn);
> + else
> + ret = -EINVAL;
> +
> + up_read(>group_lock);
> + vfio_group_try_dissolve_container(group);
> +
> +err_pin_pages:
> + vfio_group_put(group);
> + return ret;
> +
> +}
> +EXPORT_SYMBOL(vfio_pin_pages);
> +
> +/*
> + * Unpin set of host PFNs for local domain only.
> + * @dev [in] : device
> + * @pfn [in] : array of host PFNs to be unpinned.
> + * @npage [in] :count of elements in array, that is number of pages.
> + */

Ditto

--
Thanks,
Jike

> +long vfio_unpin_pages(struct device *dev, unsigned long *pfn, long npage)
> +{
> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + ssize_t ret;
> +
> + if (!dev || !pfn)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (IS_ERR(group))
> + return PTR_ERR(group);
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto err_unpin_pages;
> +
> + container = group->container;
> + down_read(>group_lock);
> +
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->unpin_pages))
> + ret = driver->ops->unpin_pages(container->iommu_data, pfn,
> +npage);
> + else
> + ret = -EINVAL;
> +
> + up_read(>group_lock);
> + vfio_group_try_dissolve_container(group);
> +
> +err_unpin_pages:
> + vfio_group_put(group);
> + return ret;
> +}
> +EXPORT_SYMBOL(vfio_unpin_pages);
> +
>  /**
>   * Module/class support
>   */
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 0ecae0b1cd34..0609a2052846 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -75,7 +75,11 @@ struct vfio_iommu_driver_ops {
>   struct iommu_group *group);
>   void(*detach_group)(void *iommu_data,
>   struct iommu_group *group);
> -
> + long(*pin_pages)(void *iommu_data, unsigned long *user_pfn,
> +  long npage, int prot,
> +  unsigned long *phys_pfn);
> + long(*unpin_pages)(void *iommu_data, unsigned long *pfn,
> +long npage);
>  };
>  
>  extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops 
> *ops);
> @@ -127,6 +131,12 @@ static inline long vfio_spapr_iommu_eeh_ioctl(struct 
> iommu_group *group,
>  }
>  #endif /* CONFIG_EEH */
>  
> +extern long vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
> +long

Re: [PATCH v10 01/19] vfio: Mediated device Core driver

2016-10-31 Thread Jike Song

On 11/01/2016 11:44 AM, Alex Williamson wrote:
> On Tue, 01 Nov 2016 11:08:15 +0800
> Jike Song <jike.s...@intel.com> wrote:
>> On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
>>> +static int mdev_attach_iommu(struct mdev_device *mdev)
>>> +{
>>> +   int ret;
>>> +   struct iommu_group *group;
>>> +
>>> +   group = iommu_group_alloc();
>>> +   if (IS_ERR(group))
>>> +   return PTR_ERR(group);  
>>
>> Maybe I overthought, but where the iommu_group is released? On successful
>> return you already have a ref, I didn't find an iommu_group_put after that.
> ...
>>> +
>>> +   ret = iommu_group_add_device(group, >dev);
>>> +   if (ret)
>>> +   goto attach_fail;
>>> +
>>> +   dev_info(>dev, "MDEV: group_id = %d\n",
>>> +iommu_group_id(group));
>>> +attach_fail:
>>> +   iommu_group_put(group);
>>> +   return ret;
>>> +}
>>> +
>>> +static void mdev_detach_iommu(struct mdev_device *mdev)
>>> +{
>>> +   iommu_group_remove_device(>dev);
> 
> Here.  Adding a device to the group takes a group reference so we can
> 'put' the group to release the reference from the alloc as soon as a
> device is added.  When we remove the device, that reference is removed
> and this should result in a release of the group if all other
> references are balanced.  Thanks,

Now I understand, thanks :)

--
Thanks,
Jike

Re: [PATCH v10 01/19] vfio: Mediated device Core driver

2016-10-31 Thread Jike Song

On 11/01/2016 11:44 AM, Alex Williamson wrote:
> On Tue, 01 Nov 2016 11:08:15 +0800
> Jike Song  wrote:
>> On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
>>> +static int mdev_attach_iommu(struct mdev_device *mdev)
>>> +{
>>> +   int ret;
>>> +   struct iommu_group *group;
>>> +
>>> +   group = iommu_group_alloc();
>>> +   if (IS_ERR(group))
>>> +   return PTR_ERR(group);  
>>
>> Maybe I overthought, but where the iommu_group is released? On successful
>> return you already have a ref, I didn't find an iommu_group_put after that.
> ...
>>> +
>>> +   ret = iommu_group_add_device(group, >dev);
>>> +   if (ret)
>>> +   goto attach_fail;
>>> +
>>> +   dev_info(>dev, "MDEV: group_id = %d\n",
>>> +iommu_group_id(group));
>>> +attach_fail:
>>> +   iommu_group_put(group);
>>> +   return ret;
>>> +}
>>> +
>>> +static void mdev_detach_iommu(struct mdev_device *mdev)
>>> +{
>>> +   iommu_group_remove_device(>dev);
> 
> Here.  Adding a device to the group takes a group reference so we can
> 'put' the group to release the reference from the alloc as soon as a
> device is added.  When we remove the device, that reference is removed
> and this should result in a release of the group if all other
> references are balanced.  Thanks,

Now I understand, thanks :)

--
Thanks,
Jike

Re: [PATCH v10 01/19] vfio: Mediated device Core driver

2016-10-31 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> Design for Mediated Device Driver:
> Main purpose of this driver is to provide a common interface for mediated
> device management that can be used by different drivers of different
> devices.
> 
> This module provides a generic interface to create the device, add it to
> mediated bus, add device to IOMMU group and then add it to vfio group.
> 
> Below is the high Level block diagram, with Nvidia, Intel and IBM devices
> as example, since these are the devices which are going to actively use
> this module as of now.
> 
>  +---+
>  |   |
>  | +---+ |  mdev_register_driver() +--+
>  | |   | +<+ __init() |
>  | |  mdev | | |  |
>  | |  bus  | +>+  |<-> VFIO user
>  | |  driver   | | probe()/remove()| vfio_mdev.ko |APIs
>  | |   | | |  |
>  | +---+ | +--+
>  |   |
>  |  MDEV CORE|
>  |   MODULE  |
>  |   mdev.ko |
>  | +---+ |  mdev_register_device() +--+
>  | |   | +<+  |
>  | |   | | |  nvidia.ko   |<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | | Physical  | |
>  | |  device   | |  mdev_register_device() +--+
>  | | interface | |<+  |
>  | |   | | |  i915.ko |<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | |   | |
>  | |   | |  mdev_register_device() +--+
>  | |   | +<+  |
>  | |   | | | ccw_device.ko|<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | +---+ |
>  +---+
> 
> Core driver provides two types of registration interfaces:
> 1. Registration interface for mediated bus driver:
> 
> /**
>   * struct mdev_driver - Mediated device's driver
>   * @name: driver name
>   * @probe: called when new device created
>   * @remove:called when device removed
>   * @driver:device driver structure
>   *
>   **/
> struct mdev_driver {
>  const char *name;
>  int  (*probe)  (struct device *dev);
>  void (*remove) (struct device *dev);
>  struct device_driverdriver;
> };
> 
> Mediated bus driver for mdev device should use this interface to register
> and unregister with core driver respectively:
> 
> int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
> void mdev_unregister_driver(struct mdev_driver *drv);
> 
> Mediated bus driver is responsible to add/delete mediated devices to/from
> VFIO group when devices are bound and unbound to the driver.
> 
> 2. Physical device driver interface
> This interface provides vendor driver the set APIs to manage physical
> device related work in its driver. APIs are :
> 
> * dev_attr_groups: attributes of the parent device.
> * mdev_attr_groups: attributes of the mediated device.
> * supported_type_groups: attributes to define supported type. This is
>mandatory field.
> * create: to allocate basic resources in vendor driver for a mediated
>  device. This is mandatory to be provided by vendor driver.
> * remove: to free resources in vendor driver when mediated device is
>  destroyed. This is mandatory to be provided by vendor driver.
> * open: open callback of mediated device
> * release: release callback of mediated device
> * read : read emulation callback.
> * write: write emulation callback.
> * mmap: mmap emulation callback.
> * ioctl: ioctl callback.
> 
> Drivers should use these interfaces to register and unregister device to
> mdev core driver respectively:
> 
> extern int  mdev_register_device(struct device *dev,
>  const struct parent_ops *ops);
> extern void mdev_unregister_device(struct device *dev);
> 
> There are no locks to serialize above callbacks in mdev driver and
> vfio_mdev driver. If required, vendor driver can have locks to serialize
> above APIs in their driver.

Maybe some information above could be placed under Documentation instead?
It's kind of weird to have the block diagram and interfaces documentation
in the git commit message :-)

> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: I73a5084574270b14541c529461ea2f03c292d510
> ---
>  drivers/vfio/Kconfig |   1 +
>  drivers/vfio/Makefile|   1 +
>

Re: [PATCH v10 01/19] vfio: Mediated device Core driver

2016-10-31 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> Design for Mediated Device Driver:
> Main purpose of this driver is to provide a common interface for mediated
> device management that can be used by different drivers of different
> devices.
> 
> This module provides a generic interface to create the device, add it to
> mediated bus, add device to IOMMU group and then add it to vfio group.
> 
> Below is the high Level block diagram, with Nvidia, Intel and IBM devices
> as example, since these are the devices which are going to actively use
> this module as of now.
> 
>  +---+
>  |   |
>  | +---+ |  mdev_register_driver() +--+
>  | |   | +<+ __init() |
>  | |  mdev | | |  |
>  | |  bus  | +>+  |<-> VFIO user
>  | |  driver   | | probe()/remove()| vfio_mdev.ko |APIs
>  | |   | | |  |
>  | +---+ | +--+
>  |   |
>  |  MDEV CORE|
>  |   MODULE  |
>  |   mdev.ko |
>  | +---+ |  mdev_register_device() +--+
>  | |   | +<+  |
>  | |   | | |  nvidia.ko   |<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | | Physical  | |
>  | |  device   | |  mdev_register_device() +--+
>  | | interface | |<+  |
>  | |   | | |  i915.ko |<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | |   | |
>  | |   | |  mdev_register_device() +--+
>  | |   | +<+  |
>  | |   | | | ccw_device.ko|<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | +---+ |
>  +---+
> 
> Core driver provides two types of registration interfaces:
> 1. Registration interface for mediated bus driver:
> 
> /**
>   * struct mdev_driver - Mediated device's driver
>   * @name: driver name
>   * @probe: called when new device created
>   * @remove:called when device removed
>   * @driver:device driver structure
>   *
>   **/
> struct mdev_driver {
>  const char *name;
>  int  (*probe)  (struct device *dev);
>  void (*remove) (struct device *dev);
>  struct device_driverdriver;
> };
> 
> Mediated bus driver for mdev device should use this interface to register
> and unregister with core driver respectively:
> 
> int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
> void mdev_unregister_driver(struct mdev_driver *drv);
> 
> Mediated bus driver is responsible to add/delete mediated devices to/from
> VFIO group when devices are bound and unbound to the driver.
> 
> 2. Physical device driver interface
> This interface provides vendor driver the set APIs to manage physical
> device related work in its driver. APIs are :
> 
> * dev_attr_groups: attributes of the parent device.
> * mdev_attr_groups: attributes of the mediated device.
> * supported_type_groups: attributes to define supported type. This is
>mandatory field.
> * create: to allocate basic resources in vendor driver for a mediated
>  device. This is mandatory to be provided by vendor driver.
> * remove: to free resources in vendor driver when mediated device is
>  destroyed. This is mandatory to be provided by vendor driver.
> * open: open callback of mediated device
> * release: release callback of mediated device
> * read : read emulation callback.
> * write: write emulation callback.
> * mmap: mmap emulation callback.
> * ioctl: ioctl callback.
> 
> Drivers should use these interfaces to register and unregister device to
> mdev core driver respectively:
> 
> extern int  mdev_register_device(struct device *dev,
>  const struct parent_ops *ops);
> extern void mdev_unregister_device(struct device *dev);
> 
> There are no locks to serialize above callbacks in mdev driver and
> vfio_mdev driver. If required, vendor driver can have locks to serialize
> above APIs in their driver.

Maybe some information above could be placed under Documentation instead?
It's kind of weird to have the block diagram and interfaces documentation
in the git commit message :-)

> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: I73a5084574270b14541c529461ea2f03c292d510
> ---
>  drivers/vfio/Kconfig |   1 +
>  drivers/vfio/Makefile|   1 +
>  drivers/vfio/mdev/Kconfig|  10 +
>

Re: [PATCH v10 18/19] docs: Sysfs ABI for mediated device framework

2016-10-31 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> Added details of sysfs ABI for mediated device framework
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: Icb0fd4ed58a2fa793fbcb1c3d5009a4403c1f3ac
> ---
>  Documentation/ABI/testing/sysfs-bus-vfio-mdev | 111 
> ++
>  1 file changed, 111 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-bus-vfio-mdev
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-vfio-mdev 
> b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
> new file mode 100644
> index ..452dbe39270e
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
> @@ -0,0 +1,111 @@
> +What:   /sys/...//mdev_supported_types/
> +Date:   October 2016
> +Contact:Kirti Wankhede 
> +Description:
> +This directory contains list of directories of currently
> + supported mediated device types and their details for
> + . Supported type attributes are defined by the
> + vendor driver who registers with Mediated device framework.
> + Each supported type is a directory whose name is created
> + by adding the device driver string as a prefix to the
> + string provided by the vendor driver.
> +
> +What:   /sys/...//mdev_supported_types//
> +Date:   October 2016
> +Contact:Kirti Wankhede 
> +Description:
> +This directory gives details of supported type, like name,
> + description, available_instances, device_api etc.
> + 'device_api' and 'available_instances' are mandatory
> + attributes to be provided by vendor driver. 'name',
> + 'description' and other vendor driver specific attributes
> + are optional.
> +

Hi Kirti,

Is there any checking in the mdev framework that mandatory attributes
are actually provided?

--
Thanks,
Jike

> +What:   /sys/.../mdev_supported_types//create
> +Date:   October 2016
> +Contact:Kirti Wankhede 
> +Description:
> + Writing UUID to this file will create mediated device of
> + type  for parent device . This is a
> + write-only file.
> + For example:
> + # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > \
> +/sys/devices/foo/mdev_supported_types/foo-1/create
> +
> +What:   /sys/.../mdev_supported_types//devices/
> +Date:   October 2016
> +Contact:Kirti Wankhede 
> +Description:
> + This directory contains symbolic links pointing to mdev
> + devices sysfs entries which are created of this .
> +
> +What:   /sys/.../mdev_supported_types//available_instances
> +Date:   October 2016
> +Contact:Kirti Wankhede 
> +Description:
> + Reading this attribute will show the number of mediated
> + devices of type  that can be created. This is a
> + readonly file.
> +Users:
> + Userspace applications interested in creating mediated
> + device of that type. Userspace application should check
> + the number of available instances could be created before
> + creating mediated device of this type.
> +
> +What:   /sys/.../mdev_supported_types//device_api
> +Date:   October 2016
> +Contact:Kirti Wankhede 
> +Description:
> + Reading this attribute will show VFIO device API supported
> + by this type. For example, "vfio-pci" for a PCI device,
> + "vfio-platform" for platform device.
> +
> +What:   /sys/.../mdev_supported_types//name
> +Date:   October 2016
> +Contact:Kirti Wankhede 
> +Description:
> + Reading this attribute will show human readable name of the
> + mediated device that will get created of type .
> + This is optional attribute. For example: "Grid M60-0Q"
> +Users:
> + Userspace applications interested in knowing the name of
> + a particular  that can help in understanding the
> + type of mediated device.
> +
> +What:   /sys/.../mdev_supported_types//description
> +Date:   October 2016
> +Contact:Kirti Wankhede 
> +Description:
> + Reading this attribute will show description of the type of
> + mediated device that will get created of type .
> + This is optional attribute. For example:
> + "2 heads, 512M FB, 2560x1600 maximum resolution"
> +Users:
> + Userspace applications interested in knowing the details of
> + a particular  that can help in understanding the
> + features

Re: [PATCH v10 18/19] docs: Sysfs ABI for mediated device framework

2016-10-31 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> Added details of sysfs ABI for mediated device framework
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: Icb0fd4ed58a2fa793fbcb1c3d5009a4403c1f3ac
> ---
>  Documentation/ABI/testing/sysfs-bus-vfio-mdev | 111 
> ++
>  1 file changed, 111 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-bus-vfio-mdev
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-vfio-mdev 
> b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
> new file mode 100644
> index ..452dbe39270e
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-bus-vfio-mdev
> @@ -0,0 +1,111 @@
> +What:   /sys/...//mdev_supported_types/
> +Date:   October 2016
> +Contact:Kirti Wankhede 
> +Description:
> +This directory contains list of directories of currently
> + supported mediated device types and their details for
> + . Supported type attributes are defined by the
> + vendor driver who registers with Mediated device framework.
> + Each supported type is a directory whose name is created
> + by adding the device driver string as a prefix to the
> + string provided by the vendor driver.
> +
> +What:   /sys/...//mdev_supported_types//
> +Date:   October 2016
> +Contact:Kirti Wankhede 
> +Description:
> +This directory gives details of supported type, like name,
> + description, available_instances, device_api etc.
> + 'device_api' and 'available_instances' are mandatory
> + attributes to be provided by vendor driver. 'name',
> + 'description' and other vendor driver specific attributes
> + are optional.
> +

Hi Kirti,

Is there any checking in the mdev framework that mandatory attributes
are actually provided?

--
Thanks,
Jike

> +What:   /sys/.../mdev_supported_types//create
> +Date:   October 2016
> +Contact:Kirti Wankhede 
> +Description:
> + Writing UUID to this file will create mediated device of
> + type  for parent device . This is a
> + write-only file.
> + For example:
> + # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > \
> +/sys/devices/foo/mdev_supported_types/foo-1/create
> +
> +What:   /sys/.../mdev_supported_types//devices/
> +Date:   October 2016
> +Contact:Kirti Wankhede 
> +Description:
> + This directory contains symbolic links pointing to mdev
> + devices sysfs entries which are created of this .
> +
> +What:   /sys/.../mdev_supported_types//available_instances
> +Date:   October 2016
> +Contact:Kirti Wankhede 
> +Description:
> + Reading this attribute will show the number of mediated
> + devices of type  that can be created. This is a
> + readonly file.
> +Users:
> + Userspace applications interested in creating mediated
> + device of that type. Userspace application should check
> + the number of available instances could be created before
> + creating mediated device of this type.
> +
> +What:   /sys/.../mdev_supported_types//device_api
> +Date:   October 2016
> +Contact:Kirti Wankhede 
> +Description:
> + Reading this attribute will show VFIO device API supported
> + by this type. For example, "vfio-pci" for a PCI device,
> + "vfio-platform" for platform device.
> +
> +What:   /sys/.../mdev_supported_types//name
> +Date:   October 2016
> +Contact:Kirti Wankhede 
> +Description:
> + Reading this attribute will show human readable name of the
> + mediated device that will get created of type .
> + This is optional attribute. For example: "Grid M60-0Q"
> +Users:
> + Userspace applications interested in knowing the name of
> + a particular  that can help in understanding the
> + type of mediated device.
> +
> +What:   /sys/.../mdev_supported_types//description
> +Date:   October 2016
> +Contact:Kirti Wankhede 
> +Description:
> + Reading this attribute will show description of the type of
> + mediated device that will get created of type .
> + This is optional attribute. For example:
> + "2 heads, 512M FB, 2560x1600 maximum resolution"
> +Users:
> + Userspace applications interested in knowing the details of
> + a particular  that can help in understanding the
> + features provided by that type of mediated device.
> +
> +What:   /sys/...///
> +Date:   October 2016
> +Contact:Kirti Wankhede 
> +Description:
> + This directory represents device

Re: [PATCH v10 10/19] vfio iommu: Add blocking notifier to notify DMA_UNMAP

2016-10-31 Thread Jike Song

On 10/31/2016 01:59 PM, Kirti Wankhede wrote:
> On 10/31/2016 9:20 AM, Jike Song wrote:
>> On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
>>> Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers
>>> about DMA_UNMAP.
>>> Exported two APIs vfio_register_notifier() and vfio_unregister_notifier().
>>> Vendor driver should register notifer using these APIs.
>>> Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate
>>> mappings.
>>>
>>> Signed-off-by: Kirti Wankhede <kwankh...@nvidia.com>
>>> Signed-off-by: Neo Jia <c...@nvidia.com>
>>> Change-Id: I5910d0024d6be87f3e8d3e0ca0eaeaaa0b17f271
>>> ---
>>>  drivers/vfio/vfio.c | 73 +
>>>  drivers/vfio/vfio_iommu_type1.c | 89 
>>> -
>>>  include/linux/vfio.h| 11 +
>>>  3 files changed, 163 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
>>> index 28b50ca14c52..ff05ac6b1e90 100644
>>> --- a/drivers/vfio/vfio.c
>>> +++ b/drivers/vfio/vfio.c
>>> @@ -1891,6 +1891,79 @@ err_unpin_pages:
>>>  }
>>>  EXPORT_SYMBOL(vfio_unpin_pages);
>>>  
>>> +int vfio_register_notifier(struct device *dev, struct notifier_block *nb)
>>> +{
>>
>> Hi Kirti,
>>
>> Given that below 4 methods are members of vfio_iommu_driver_ops:
>>
>>  pin_pages
>>  unpin_pages
>>  register_notifier
>>  unregister_notifier
>>
>> the names of exposed VFIO APIs could possibly be clearer:
>>
>>  vfio_iommu_pin_pages
>>  vfio_iommu_unpin_pages
>>  vfio_iommu_register_notifier
>>  vfio_iommu_unreigster_nodier
>>
> 
> Hey Jike,
> 
> I had followed the same style as other members in this structure:
> 
>   attach_group
>   detach_group
> 

I mean the APIs exposed. For example, vfio_register_notifier() is somehow
by the name too generic to know what it is provided for.

--
Thanks,
Jike

Re: [PATCH v10 10/19] vfio iommu: Add blocking notifier to notify DMA_UNMAP

2016-10-31 Thread Jike Song

On 10/31/2016 01:59 PM, Kirti Wankhede wrote:
> On 10/31/2016 9:20 AM, Jike Song wrote:
>> On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
>>> Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers
>>> about DMA_UNMAP.
>>> Exported two APIs vfio_register_notifier() and vfio_unregister_notifier().
>>> Vendor driver should register notifer using these APIs.
>>> Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate
>>> mappings.
>>>
>>> Signed-off-by: Kirti Wankhede 
>>> Signed-off-by: Neo Jia 
>>> Change-Id: I5910d0024d6be87f3e8d3e0ca0eaeaaa0b17f271
>>> ---
>>>  drivers/vfio/vfio.c | 73 +
>>>  drivers/vfio/vfio_iommu_type1.c | 89 
>>> -
>>>  include/linux/vfio.h| 11 +
>>>  3 files changed, 163 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
>>> index 28b50ca14c52..ff05ac6b1e90 100644
>>> --- a/drivers/vfio/vfio.c
>>> +++ b/drivers/vfio/vfio.c
>>> @@ -1891,6 +1891,79 @@ err_unpin_pages:
>>>  }
>>>  EXPORT_SYMBOL(vfio_unpin_pages);
>>>  
>>> +int vfio_register_notifier(struct device *dev, struct notifier_block *nb)
>>> +{
>>
>> Hi Kirti,
>>
>> Given that below 4 methods are members of vfio_iommu_driver_ops:
>>
>>  pin_pages
>>  unpin_pages
>>  register_notifier
>>  unregister_notifier
>>
>> the names of exposed VFIO APIs could possibly be clearer:
>>
>>  vfio_iommu_pin_pages
>>  vfio_iommu_unpin_pages
>>  vfio_iommu_register_notifier
>>  vfio_iommu_unreigster_nodier
>>
> 
> Hey Jike,
> 
> I had followed the same style as other members in this structure:
> 
>   attach_group
>   detach_group
> 

I mean the APIs exposed. For example, vfio_register_notifier() is somehow
by the name too generic to know what it is provided for.

--
Thanks,
Jike

Re: [PATCH v10 10/19] vfio iommu: Add blocking notifier to notify DMA_UNMAP

2016-10-30 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers
> about DMA_UNMAP.
> Exported two APIs vfio_register_notifier() and vfio_unregister_notifier().
> Vendor driver should register notifer using these APIs.
> Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate
> mappings.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: I5910d0024d6be87f3e8d3e0ca0eaeaaa0b17f271
> ---
>  drivers/vfio/vfio.c | 73 +
>  drivers/vfio/vfio_iommu_type1.c | 89 
> -
>  include/linux/vfio.h| 11 +
>  3 files changed, 163 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 28b50ca14c52..ff05ac6b1e90 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1891,6 +1891,79 @@ err_unpin_pages:
>  }
>  EXPORT_SYMBOL(vfio_unpin_pages);
>  
> +int vfio_register_notifier(struct device *dev, struct notifier_block *nb)
> +{

Hi Kirti,

Given that below 4 methods are members of vfio_iommu_driver_ops:

pin_pages
unpin_pages
register_notifier
unregister_notifier

the names of exposed VFIO APIs could possibly be clearer:

vfio_iommu_pin_pages
vfio_iommu_unpin_pages
vfio_iommu_register_notifier
vfio_iommu_unreigster_nodier

--
Thanks,
Jike

> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + ssize_t ret;
> +
> + if (!dev || !nb)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (IS_ERR(group))
> + return PTR_ERR(group);
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto err_register_nb;
> +
> + container = group->container;
> + down_read(>group_lock);
> +
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->register_notifier))
> + ret = driver->ops->register_notifier(container->iommu_data, nb);
> + else
> + ret = -EINVAL;
> +
> + up_read(>group_lock);
> + vfio_group_try_dissolve_container(group);
> +
> +err_register_nb:
> + vfio_group_put(group);
> + return ret;
> +}
> +EXPORT_SYMBOL(vfio_register_notifier);
> +
> +int vfio_unregister_notifier(struct device *dev, struct notifier_block *nb)
> +{
> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + ssize_t ret;
> +
> + if (!dev || !nb)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (IS_ERR(group))
> + return PTR_ERR(group);
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto err_unregister_nb;
> +
> + container = group->container;
> + down_read(>group_lock);
> +
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->unregister_notifier))
> + ret = driver->ops->unregister_notifier(container->iommu_data,
> +nb);
> + else
> + ret = -EINVAL;
> +
> + up_read(>group_lock);
> + vfio_group_try_dissolve_container(group);
> +
> +err_unregister_nb:
> + vfio_group_put(group);
> + return ret;
> +}
> +EXPORT_SYMBOL(vfio_unregister_notifier);
> +
>  /**
>   * Module/class support
>   */
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 5add11a147e1..a4bd331ac0fd 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -37,6 +37,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson "
> @@ -59,6 +60,7 @@ struct vfio_iommu {
>   struct vfio_domain  *external_domain; /* domain for external user */
>   struct mutexlock;
>   struct rb_root  dma_list;
> + struct blocking_notifier_head notifier;
>   boolv2;
>   boolnesting;
>  };
> @@ -549,7 +551,8 @@ static long vfio_iommu_type1_pin_pages(void *iommu_data,
>  
>   mutex_lock(>lock);
>  
> - if (!iommu->external_domain) {
> + /* Fail if notifier list is empty */
> + if ((!iommu->external_domain) || (!iommu->notifier.head)) {
>   ret = -EINVAL;
>   goto pin_done;
>   }
> @@ -768,6 +771,50 @@ static unsigned long vfio_pgsize_bitmap(struct 
> vfio_iommu *iommu)
>   return bitmap;
>  }
>  
> +/*
> + * This function finds pfn in domain->external_addr_space->pfn_list for given
> + * iova range. If pfn exist, notify pfn to registered notifier list. On
> + * receiving notifier callback, vendor driver should invalidate the mapping 
> and

Re: [PATCH v10 10/19] vfio iommu: Add blocking notifier to notify DMA_UNMAP

2016-10-30 Thread Jike Song

On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
> Added blocking notifier to IOMMU TYPE1 driver to notify vendor drivers
> about DMA_UNMAP.
> Exported two APIs vfio_register_notifier() and vfio_unregister_notifier().
> Vendor driver should register notifer using these APIs.
> Vendor driver should use VFIO_IOMMU_NOTIFY_DMA_UNMAP action to invalidate
> mappings.
> 
> Signed-off-by: Kirti Wankhede 
> Signed-off-by: Neo Jia 
> Change-Id: I5910d0024d6be87f3e8d3e0ca0eaeaaa0b17f271
> ---
>  drivers/vfio/vfio.c | 73 +
>  drivers/vfio/vfio_iommu_type1.c | 89 
> -
>  include/linux/vfio.h| 11 +
>  3 files changed, 163 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 28b50ca14c52..ff05ac6b1e90 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1891,6 +1891,79 @@ err_unpin_pages:
>  }
>  EXPORT_SYMBOL(vfio_unpin_pages);
>  
> +int vfio_register_notifier(struct device *dev, struct notifier_block *nb)
> +{

Hi Kirti,

Given that below 4 methods are members of vfio_iommu_driver_ops:

pin_pages
unpin_pages
register_notifier
unregister_notifier

the names of exposed VFIO APIs could possibly be clearer:

vfio_iommu_pin_pages
vfio_iommu_unpin_pages
vfio_iommu_register_notifier
vfio_iommu_unreigster_nodier

--
Thanks,
Jike

> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + ssize_t ret;
> +
> + if (!dev || !nb)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (IS_ERR(group))
> + return PTR_ERR(group);
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto err_register_nb;
> +
> + container = group->container;
> + down_read(>group_lock);
> +
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->register_notifier))
> + ret = driver->ops->register_notifier(container->iommu_data, nb);
> + else
> + ret = -EINVAL;
> +
> + up_read(>group_lock);
> + vfio_group_try_dissolve_container(group);
> +
> +err_register_nb:
> + vfio_group_put(group);
> + return ret;
> +}
> +EXPORT_SYMBOL(vfio_register_notifier);
> +
> +int vfio_unregister_notifier(struct device *dev, struct notifier_block *nb)
> +{
> + struct vfio_container *container;
> + struct vfio_group *group;
> + struct vfio_iommu_driver *driver;
> + ssize_t ret;
> +
> + if (!dev || !nb)
> + return -EINVAL;
> +
> + group = vfio_group_get_from_dev(dev);
> + if (IS_ERR(group))
> + return PTR_ERR(group);
> +
> + ret = vfio_group_add_container_user(group);
> + if (ret)
> + goto err_unregister_nb;
> +
> + container = group->container;
> + down_read(>group_lock);
> +
> + driver = container->iommu_driver;
> + if (likely(driver && driver->ops->unregister_notifier))
> + ret = driver->ops->unregister_notifier(container->iommu_data,
> +nb);
> + else
> + ret = -EINVAL;
> +
> + up_read(>group_lock);
> + vfio_group_try_dissolve_container(group);
> +
> +err_unregister_nb:
> + vfio_group_put(group);
> + return ret;
> +}
> +EXPORT_SYMBOL(vfio_unregister_notifier);
> +
>  /**
>   * Module/class support
>   */
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 5add11a147e1..a4bd331ac0fd 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -37,6 +37,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define DRIVER_VERSION  "0.2"
>  #define DRIVER_AUTHOR   "Alex Williamson "
> @@ -59,6 +60,7 @@ struct vfio_iommu {
>   struct vfio_domain  *external_domain; /* domain for external user */
>   struct mutexlock;
>   struct rb_root  dma_list;
> + struct blocking_notifier_head notifier;
>   boolv2;
>   boolnesting;
>  };
> @@ -549,7 +551,8 @@ static long vfio_iommu_type1_pin_pages(void *iommu_data,
>  
>   mutex_lock(>lock);
>  
> - if (!iommu->external_domain) {
> + /* Fail if notifier list is empty */
> + if ((!iommu->external_domain) || (!iommu->notifier.head)) {
>   ret = -EINVAL;
>   goto pin_done;
>   }
> @@ -768,6 +771,50 @@ static unsigned long vfio_pgsize_bitmap(struct 
> vfio_iommu *iommu)
>   return bitmap;
>  }
>  
> +/*
> + * This function finds pfn in domain->external_addr_space->pfn_list for given
> + * iova range. If pfn exist, notify pfn to registered notifier list. On
> + * receiving notifier callback, vendor driver should invalidate the mapping 
> and
> + * call vfio_unpin_pages() to unpin this pfn. With that vfio_pfn

Re: [PATCH v10 01/19] vfio: Mediated device Core driver

2016-10-29 Thread Jike Song

On 10/29/2016 06:06 PM, Kirti Wankhede wrote:
> 
> 
> On 10/29/2016 10:00 AM, Jike Song wrote:
>> On 10/27/2016 05:29 AM, Kirti Wankhede wrote:
>>> +int mdev_register_device(struct device *dev, const struct parent_ops *ops)
>>> +{
>>> +   int ret;
>>> +   struct parent_device *parent;
>>> +
>>> +   /* check for mandatory ops */
>>> +   if (!ops || !ops->create || !ops->remove || !ops->supported_type_groups)
>>> +   return -EINVAL;
>>> +
>>> +   dev = get_device(dev);
>>> +   if (!dev)
>>> +   return -EINVAL;
>>> +
>>> +   mutex_lock(_list_lock);
>>> +
>>> +   /* Check for duplicate */
>>> +   parent = __find_parent_device(dev);
>>> +   if (parent) {
>>> +   ret = -EEXIST;
>>> +   goto add_dev_err;
>>> +   }
>>> +
>>> +   parent = kzalloc(sizeof(*parent), GFP_KERNEL);
>>> +   if (!parent) {
>>> +   ret = -ENOMEM;
>>> +   goto add_dev_err;
>>> +   }
>>> +
>>> +   kref_init(>ref);
>>> +   mutex_init(>lock);
>>> +
>>> +   parent->dev = dev;
>>> +   parent->ops = ops;
>>> +
>>> +   ret = parent_create_sysfs_files(parent);
>>> +   if (ret) {
>>> +   mutex_unlock(_list_lock);
>>> +   mdev_put_parent(parent);
>>> +   return ret;
>>> +   }
>>> +
>>> +   ret = class_compat_create_link(mdev_bus_compat_class, dev, NULL);
>>> +   if (ret)
>>> +   dev_warn(dev, "Failed to create compatibility class link\n");
>>> +
>>
>> Hi Kirti,
>>
>> Like I replied to previous version:
>>
>>  http://www.spinics.net/lists/kvm/msg139331.html
>>
> 
> Hi Jike,
> 
> I saw your reply but by that time v10 version of patch series was out
> for review.
> 

Ah..yes, I forgot that :)

>> You can always check if mdev_bus_compat_class already registered
>> here, and register it if not yet. Same logic should be adopted to
>> mdev_init.
>>
>> Current implementation will simply panic if configured as builtin,
>> which is rare but far from impossible.
>>
> 
> Can you verify attached patch with v10 patch-set whether this works for you?
> I'll incorporate this change in my next version.
> 

Seems cool. But would you please also keep the register in mdev_init(),
just check the 'in case it was already registered' case? Thanks!

--
Thanks,
Jike

1 2 >

1 - 100 of 196 matches

Mail list logo