Re: [Nouveau] [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken

2019-06-26 Thread John Hubbard
On 6/25/19 10:45 PM, Michal Hocko wrote:
> On Tue 25-06-19 20:15:28, John Hubbard wrote:
>> On 6/19/19 12:27 PM, Jason Gunthorpe wrote:
>>> On Thu, Jun 13, 2019 at 06:23:04PM -0700, John Hubbard wrote:
 On 6/13/19 5:43 PM, Ira Weiny wrote:
> On Thu, Jun 13, 2019 at 07:58:29PM +, Jason Gunthorpe wrote:
>> On Thu, Jun 13, 2019 at 12:53:02PM -0700, Ralph Campbell wrote:
>>>
 ...
> So I think it is ok.  Frankly I was wondering if we should remove the 
> public
> type altogether but conceptually it seems ok.  But I don't see any users 
> of it
> so...  should we get rid of it in the code rather than turning the config 
> off?
>
> Ira

 That seems reasonable. I recall that the hope was for those IBM Power 9
 systems to use _PUBLIC, as they have hardware-based coherent device (GPU)
 memory, and so the memory really is visible to the CPU. And the IBM team
 was thinking of taking advantage of it. But I haven't seen anything on
 that front for a while.
>>>
>>> Does anyone know who those people are and can we encourage them to
>>> send some patches? :)
>>>
>>
>> I asked about this, and it seems that the idea was: DEVICE_PUBLIC was there
>> in order to provide an alternative way to do things (such as migrate memory
>> to and from a device), in case the combination of existing and near-future
>> NUMA APIs was insufficient. This probably came as a follow-up to the early
>> 2017-ish conversations about NUMA, in which the linux-mm recommendation was
>> "try using HMM mechanisms, and if those are inadequate, then maybe we can
>> look at enhancing NUMA so that it has better handling of advanced (GPU-like)
>> devices".
> 
> Yes that was the original idea. It sounds so much better to use a common
> framework rather than awkward special cased cpuless NUMA nodes with
> a weird semantic. User of the neither of the two has shown up so I guess
> that the envisioned HW just didn't materialized. Or has there been a
> completely different approach chosen?

The HW showed up, alright: it's the IBM Power 9, which provides HW-based
memory coherency between its CPUs and GPUs. So on this system, the CPU is
allowed to access GPU memory, which *could* be modeled as DEVICE_PUBLIC.

However, what happened was that the system worked well enough with a combination
of the device driver, plus NUMA APIs, plus heaven knows what sort of /proc 
tuning
might have also gone on. :) No one saw the need to reach for the DEVICE_PUBLIC
functionality.

> 
>> In the end, however, _PUBLIC was never used, nor does anyone in the local
>> (NVIDIA + IBM) kernel vicinity seem to have plans to use it.  So it really
>> does seem safe to remove, although of course it's good to start with 
>> BROKEN and see if anyone pops up and complains.
> 
> Well, I do not really see much of a difference. Preserving an unused
> code which doesn't have any user in sight just adds a maintenance burden
> whether the code depends on BROKEN or not. We can always revert patches
> which remove the code once a real user shows up.

Sure, I don't see much difference either. Either way seems fine.

thanks,
-- 
John Hubbard
NVIDIA
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken

2019-06-25 Thread Michal Hocko
On Tue 25-06-19 20:15:28, John Hubbard wrote:
> On 6/19/19 12:27 PM, Jason Gunthorpe wrote:
> > On Thu, Jun 13, 2019 at 06:23:04PM -0700, John Hubbard wrote:
> >> On 6/13/19 5:43 PM, Ira Weiny wrote:
> >>> On Thu, Jun 13, 2019 at 07:58:29PM +, Jason Gunthorpe wrote:
>  On Thu, Jun 13, 2019 at 12:53:02PM -0700, Ralph Campbell wrote:
> >
> >> ...
> >>> So I think it is ok.  Frankly I was wondering if we should remove the 
> >>> public
> >>> type altogether but conceptually it seems ok.  But I don't see any users 
> >>> of it
> >>> so...  should we get rid of it in the code rather than turning the config 
> >>> off?
> >>>
> >>> Ira
> >>
> >> That seems reasonable. I recall that the hope was for those IBM Power 9
> >> systems to use _PUBLIC, as they have hardware-based coherent device (GPU)
> >> memory, and so the memory really is visible to the CPU. And the IBM team
> >> was thinking of taking advantage of it. But I haven't seen anything on
> >> that front for a while.
> > 
> > Does anyone know who those people are and can we encourage them to
> > send some patches? :)
> > 
> 
> I asked about this, and it seems that the idea was: DEVICE_PUBLIC was there
> in order to provide an alternative way to do things (such as migrate memory
> to and from a device), in case the combination of existing and near-future
> NUMA APIs was insufficient. This probably came as a follow-up to the early
> 2017-ish conversations about NUMA, in which the linux-mm recommendation was
> "try using HMM mechanisms, and if those are inadequate, then maybe we can
> look at enhancing NUMA so that it has better handling of advanced (GPU-like)
> devices".

Yes that was the original idea. It sounds so much better to use a common
framework rather than awkward special cased cpuless NUMA nodes with
a weird semantic. User of the neither of the two has shown up so I guess
that the envisioned HW just didn't materialized. Or has there been a
completely different approach chosen?

> In the end, however, _PUBLIC was never used, nor does anyone in the local
> (NVIDIA + IBM) kernel vicinity seem to have plans to use it.  So it really
> does seem safe to remove, although of course it's good to start with 
> BROKEN and see if anyone pops up and complains.

Well, I do not really see much of a difference. Preserving an unused
code which doesn't have any user in sight just adds a maintenance burden
whether the code depends on BROKEN or not. We can always revert patches
which remove the code once a real user shows up.
-- 
Michal Hocko
SUSE Labs
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken

2019-06-25 Thread John Hubbard
On 6/19/19 12:27 PM, Jason Gunthorpe wrote:
> On Thu, Jun 13, 2019 at 06:23:04PM -0700, John Hubbard wrote:
>> On 6/13/19 5:43 PM, Ira Weiny wrote:
>>> On Thu, Jun 13, 2019 at 07:58:29PM +, Jason Gunthorpe wrote:
 On Thu, Jun 13, 2019 at 12:53:02PM -0700, Ralph Campbell wrote:
>
>> ...
>>> So I think it is ok.  Frankly I was wondering if we should remove the public
>>> type altogether but conceptually it seems ok.  But I don't see any users of 
>>> it
>>> so...  should we get rid of it in the code rather than turning the config 
>>> off?
>>>
>>> Ira
>>
>> That seems reasonable. I recall that the hope was for those IBM Power 9
>> systems to use _PUBLIC, as they have hardware-based coherent device (GPU)
>> memory, and so the memory really is visible to the CPU. And the IBM team
>> was thinking of taking advantage of it. But I haven't seen anything on
>> that front for a while.
> 
> Does anyone know who those people are and can we encourage them to
> send some patches? :)
> 

I asked about this, and it seems that the idea was: DEVICE_PUBLIC was there
in order to provide an alternative way to do things (such as migrate memory
to and from a device), in case the combination of existing and near-future
NUMA APIs was insufficient. This probably came as a follow-up to the early
2017-ish conversations about NUMA, in which the linux-mm recommendation was
"try using HMM mechanisms, and if those are inadequate, then maybe we can
look at enhancing NUMA so that it has better handling of advanced (GPU-like)
devices".

In the end, however, _PUBLIC was never used, nor does anyone in the local
(NVIDIA + IBM) kernel vicinity seem to have plans to use it.  So it really
does seem safe to remove, although of course it's good to start with 
BROKEN and see if anyone pops up and complains.

thanks,
-- 
John Hubbard
NVIDIA
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken

2019-06-25 Thread Christoph Hellwig
On Tue, Jun 25, 2019 at 11:44:28AM +, Jason Gunthorpe wrote:
> Which tree and what does the resolution look like?

Looks like in -mm.  The current commit in linux-next is:

commit 0d23b042f26955fb35721817beb98ba7f1d9ed9f
Author: Robin Murphy 
Date:   Fri Jun 14 10:42:14 2019 +1000

mm: clean up is_device_*_page() definitions


> Also, I don't want to be making the decision if we should keep/remove
> DEVICE_PUBLIC, so let's get an Ack from Andrew/etc?
> 
> My main reluctance is that I know there is HW out there that can do
> coherent, and I want to believe they are coming with patches, just
> too slowly. But I'd also rather those people defend themselves :P

Lets keep everything as-is for now.  I'm pretty certain nothing
will show up, but letting this linger another release or two
shouldn't be much of a problem.  And if we urgently feel like removing
it we can do it after -rc1.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken

2019-06-25 Thread Christoph Hellwig
On Thu, Jun 20, 2019 at 09:26:48PM +0200, Michal Hocko wrote:
> On Thu 13-06-19 11:43:21, Christoph Hellwig wrote:
> > The code hasn't been used since it was added to the tree, and doesn't
> > appear to actually be usable.  Mark it as BROKEN until either a user
> > comes along or we finally give up on it.
> 
> I would go even further and simply remove all the DEVICE_PUBLIC code.

I looked into that as I now got the feedback twice.  It would
create a conflict with another tree cleaning things up around the
is_device_private defintion, but otherwise I'd be glad to just remove
it.

Jason, as this goes through your tree, do you mind the additional
conflict?
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken

2019-06-20 Thread Michal Hocko
On Thu 13-06-19 11:43:21, Christoph Hellwig wrote:
> The code hasn't been used since it was added to the tree, and doesn't
> appear to actually be usable.  Mark it as BROKEN until either a user
> comes along or we finally give up on it.

I would go even further and simply remove all the DEVICE_PUBLIC code.

> Signed-off-by: Christoph Hellwig 

Anyway
Acked-by: Michal Hocko 

> ---
>  mm/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 0d2ba7e1f43e..406fa45e9ecc 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -721,6 +721,7 @@ config DEVICE_PRIVATE
>  config DEVICE_PUBLIC
>   bool "Addressable device memory (like GPU memory)"
>   depends on ARCH_HAS_HMM
> + depends on BROKEN
>   select HMM
>   select DEV_PAGEMAP_OPS
>  
> -- 
> 2.20.1

-- 
Michal Hocko
SUSE Labs
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken

2019-06-19 Thread Dan Williams
On Wed, Jun 19, 2019 at 12:42 PM Jason Gunthorpe  wrote:
>
> On Thu, Jun 13, 2019 at 06:23:04PM -0700, John Hubbard wrote:
> > On 6/13/19 5:43 PM, Ira Weiny wrote:
> > > On Thu, Jun 13, 2019 at 07:58:29PM +, Jason Gunthorpe wrote:
> > >> On Thu, Jun 13, 2019 at 12:53:02PM -0700, Ralph Campbell wrote:
> > >>>
> > ...
> > >> Hum, so the only thing this config does is short circuit here:
> > >>
> > >> static inline bool is_device_public_page(const struct page *page)
> > >> {
> > >> return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
> > >> IS_ENABLED(CONFIG_DEVICE_PUBLIC) &&
> > >> is_zone_device_page(page) &&
> > >> page->pgmap->type == MEMORY_DEVICE_PUBLIC;
> > >> }
> > >>
> > >> Which is called all over the place..
> > >
> > >   yes but the earlier patch:
> > >
> > > [PATCH 03/22] mm: remove hmm_devmem_add_resource
> > >
> > > Removes the only place type is set to MEMORY_DEVICE_PUBLIC.
> > >
> > > So I think it is ok.  Frankly I was wondering if we should remove the 
> > > public
> > > type altogether but conceptually it seems ok.  But I don't see any users 
> > > of it
> > > so...  should we get rid of it in the code rather than turning the config 
> > > off?
> > >
> > > Ira
> >
> > That seems reasonable. I recall that the hope was for those IBM Power 9
> > systems to use _PUBLIC, as they have hardware-based coherent device (GPU)
> > memory, and so the memory really is visible to the CPU. And the IBM team
> > was thinking of taking advantage of it. But I haven't seen anything on
> > that front for a while.
>
> Does anyone know who those people are and can we encourage them to
> send some patches? :)

I expect marking it CONFIG_BROKEN with the threat of deleting it if no
patches show up *is* the encouragement.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken

2019-06-14 Thread Christoph Hellwig
On Thu, Jun 13, 2019 at 05:43:15PM -0700, Ira Weiny wrote:
>   yes but the earlier patch:
> 
> [PATCH 03/22] mm: remove hmm_devmem_add_resource
> 
> Removes the only place type is set to MEMORY_DEVICE_PUBLIC.
> 
> So I think it is ok.  Frankly I was wondering if we should remove the public
> type altogether but conceptually it seems ok.  But I don't see any users of it
> so...  should we get rid of it in the code rather than turning the config off?

That was my original idea.  But then again Jerome spent a lot of effort
putting hooks for it all over the mm and it would seem a little root
to just rip this out ASAP.  I'll give it some more time, but it it doesn't
get used after a few more kernel releases we should nuke it.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken

2019-06-13 Thread Jason Gunthorpe
On Thu, Jun 13, 2019 at 12:53:02PM -0700, Ralph Campbell wrote:
> 
> On 6/13/19 12:44 PM, Jason Gunthorpe wrote:
> > On Thu, Jun 13, 2019 at 11:43:21AM +0200, Christoph Hellwig wrote:
> > > The code hasn't been used since it was added to the tree, and doesn't
> > > appear to actually be usable.  Mark it as BROKEN until either a user
> > > comes along or we finally give up on it.
> > > 
> > > Signed-off-by: Christoph Hellwig 
> > >   mm/Kconfig | 1 +
> > >   1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/mm/Kconfig b/mm/Kconfig
> > > index 0d2ba7e1f43e..406fa45e9ecc 100644
> > > +++ b/mm/Kconfig
> > > @@ -721,6 +721,7 @@ config DEVICE_PRIVATE
> > >   config DEVICE_PUBLIC
> > >   bool "Addressable device memory (like GPU memory)"
> > >   depends on ARCH_HAS_HMM
> > > + depends on BROKEN
> > >   select HMM
> > >   select DEV_PAGEMAP_OPS
> > 
> > This seems a bit harsh, we do have another kconfig that selects this
> > one today:
> > 
> > config DRM_NOUVEAU_SVM
> >  bool "(EXPERIMENTAL) Enable SVM (Shared Virtual Memory) support"
> >  depends on ARCH_HAS_HMM
> >  depends on DRM_NOUVEAU
> >  depends on STAGING
> >  select HMM_MIRROR
> >  select DEVICE_PRIVATE
> >  default n
> >  help
> >Say Y here if you want to enable experimental support for
> >Shared Virtual Memory (SVM).
> > 
> > Maybe it should be depends on STAGING not broken?
> > 
> > or maybe nouveau_svm doesn't actually need DEVICE_PRIVATE?
> > 
> > Jason
> 
> I think you are confusing DEVICE_PRIVATE for DEVICE_PUBLIC.
> DRM_NOUVEAU_SVM does use DEVICE_PRIVATE but not DEVICE_PUBLIC.

Indeed you are correct, never mind

Hum, so the only thing this config does is short circuit here:

static inline bool is_device_public_page(const struct page *page)
{
return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) &&
IS_ENABLED(CONFIG_DEVICE_PUBLIC) &&
is_zone_device_page(page) &&
page->pgmap->type == MEMORY_DEVICE_PUBLIC;
}

Which is called all over the place.. 

So, yes, we really don't want any distro or something to turn this on
until it has a use.

Reviewed-by: Jason Gunthorpe 

Jason
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken

2019-06-13 Thread Jason Gunthorpe
On Thu, Jun 13, 2019 at 11:43:21AM +0200, Christoph Hellwig wrote:
> The code hasn't been used since it was added to the tree, and doesn't
> appear to actually be usable.  Mark it as BROKEN until either a user
> comes along or we finally give up on it.
> 
> Signed-off-by: Christoph Hellwig 
>  mm/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 0d2ba7e1f43e..406fa45e9ecc 100644
> +++ b/mm/Kconfig
> @@ -721,6 +721,7 @@ config DEVICE_PRIVATE
>  config DEVICE_PUBLIC
>   bool "Addressable device memory (like GPU memory)"
>   depends on ARCH_HAS_HMM
> + depends on BROKEN
>   select HMM
>   select DEV_PAGEMAP_OPS

This seems a bit harsh, we do have another kconfig that selects this
one today:

config DRM_NOUVEAU_SVM
bool "(EXPERIMENTAL) Enable SVM (Shared Virtual Memory) support"
depends on ARCH_HAS_HMM
depends on DRM_NOUVEAU
depends on STAGING
select HMM_MIRROR
select DEVICE_PRIVATE
default n
help
  Say Y here if you want to enable experimental support for
  Shared Virtual Memory (SVM).

Maybe it should be depends on STAGING not broken?

or maybe nouveau_svm doesn't actually need DEVICE_PRIVATE?

Jason
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken

2019-06-13 Thread Christoph Hellwig
The code hasn't been used since it was added to the tree, and doesn't
appear to actually be usable.  Mark it as BROKEN until either a user
comes along or we finally give up on it.

Signed-off-by: Christoph Hellwig 
---
 mm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index 0d2ba7e1f43e..406fa45e9ecc 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -721,6 +721,7 @@ config DEVICE_PRIVATE
 config DEVICE_PUBLIC
bool "Addressable device memory (like GPU memory)"
depends on ARCH_HAS_HMM
+   depends on BROKEN
select HMM
select DEV_PAGEMAP_OPS
 
-- 
2.20.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau