Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Christoph Hellwig
On Fri, Aug 16, 2019 at 12:43:07AM +, Jason Gunthorpe wrote:
> On Thu, Aug 15, 2019 at 04:51:33PM -0400, Jerome Glisse wrote:
> 
> > struct page. In this case any way we can update the
> > nouveau_dmem_page() to check that page page->pgmap == the
> > expected pgmap.
> 
> I was also wondering if that is a problem.. just blindly doing a
> container_of on the page->pgmap does seem like it assumes that only
> this driver is using DEVICE_PRIVATE.
> 
> It seems like something missing in hmm_range_fault, it should be told
> what DEVICE_PRIVATE is acceptable to trigger HMM_PFN_DEVICE_PRIVATE
> and fault all others?

The whole device private handling in hmm and migrate_vma seems pretty
broken as far as I can tell, and I have some WIP patches.  Basically we
should not touch (or possibly eventually call migrate to ram eventually
in the future) device private pages not owned by the caller, where I
try to defined the caller by the dev_pagemap_ops instance.  


Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Christoph Hellwig
On Thu, Aug 15, 2019 at 04:33:06PM -0400, Jerome Glisse wrote:
> So nor HMM nor driver should dereference the struct page (i do not
> think any iommu driver would either),

Both current hmm drivers convert the hmm pfn back to a page and
eventually call dma_map_page on it.  As do the ODP patches from you.


Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Dan Williams
On Thu, Aug 15, 2019 at 5:41 PM Jason Gunthorpe  wrote:
>
> On Thu, Aug 15, 2019 at 01:47:12PM -0700, Dan Williams wrote:
> > On Thu, Aug 15, 2019 at 1:41 PM Jason Gunthorpe  wrote:
> > >
> > > On Thu, Aug 15, 2019 at 04:33:06PM -0400, Jerome Glisse wrote:
> > >
> > > > So nor HMM nor driver should dereference the struct page (i do not
> > > > think any iommu driver would either),
> > >
> > > Er, they do technically deref the struct page:
> > >
> > > nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
> > >  struct hmm_range *range)
> > > struct page *page;
> > > page = hmm_pfn_to_page(range, range->pfns[i]);
> > > if (!nouveau_dmem_page(drm, page)) {
> > >
> > >
> > > nouveau_dmem_page(struct nouveau_drm *drm, struct page *page)
> > > {
> > > return is_device_private_page(page) && drm->dmem == 
> > > page_to_dmem(page)
> > >
> > >
> > > Which does touch 'page->pgmap'
> > >
> > > Is this OK without having a get_dev_pagemap() ?
> > >
> > > Noting that the collision-retry scheme doesn't protect anything here
> > > as we can have a concurrent invalidation while doing the above deref.
> >
> > As long take_driver_page_table_lock() in Jerome's flow can replace
> > percpu_ref_tryget_live() on the pagemap reference. It seems
> > nouveau_dmem_convert_pfn() happens after:
> >
> > mutex_lock(>mutex);
> > if (!nouveau_range_done()) {
> >
> > ...so I would expect that to be functionally equivalent to validating
> > the reference count.
>
> Yes, OK, that makes sense, I was mostly surprised by the statement the
> driver doesn't touch the struct page..
>
> I suppose "doesn't touch the struct page out of the driver lock" is
> the case.
>
> However, this means we cannot do any processing of ZONE_DEVICE pages
> outside the driver lock, so eg, doing any DMA map that might rely on
> MEMORY_DEVICE_PCI_P2PDMA has to be done in the driver lock, which is
> a bit unfortunate.

Wouldn't P2PDMA use page pins? Not needing to hold a lock over
ZONE_DEVICE page operations was one of the motivations for plumbing
get_dev_pagemap() with a percpu-ref.


Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Jason Gunthorpe
On Thu, Aug 15, 2019 at 04:51:33PM -0400, Jerome Glisse wrote:

> struct page. In this case any way we can update the
> nouveau_dmem_page() to check that page page->pgmap == the
> expected pgmap.

I was also wondering if that is a problem.. just blindly doing a
container_of on the page->pgmap does seem like it assumes that only
this driver is using DEVICE_PRIVATE.

It seems like something missing in hmm_range_fault, it should be told
what DEVICE_PRIVATE is acceptable to trigger HMM_PFN_DEVICE_PRIVATE
and fault all others?

Jason 


Re: [Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Jason Gunthorpe
On Thu, Aug 15, 2019 at 01:47:12PM -0700, Dan Williams wrote:
> On Thu, Aug 15, 2019 at 1:41 PM Jason Gunthorpe  wrote:
> >
> > On Thu, Aug 15, 2019 at 04:33:06PM -0400, Jerome Glisse wrote:
> >
> > > So nor HMM nor driver should dereference the struct page (i do not
> > > think any iommu driver would either),
> >
> > Er, they do technically deref the struct page:
> >
> > nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
> >  struct hmm_range *range)
> > struct page *page;
> > page = hmm_pfn_to_page(range, range->pfns[i]);
> > if (!nouveau_dmem_page(drm, page)) {
> >
> >
> > nouveau_dmem_page(struct nouveau_drm *drm, struct page *page)
> > {
> > return is_device_private_page(page) && drm->dmem == 
> > page_to_dmem(page)
> >
> >
> > Which does touch 'page->pgmap'
> >
> > Is this OK without having a get_dev_pagemap() ?
> >
> > Noting that the collision-retry scheme doesn't protect anything here
> > as we can have a concurrent invalidation while doing the above deref.
> 
> As long take_driver_page_table_lock() in Jerome's flow can replace
> percpu_ref_tryget_live() on the pagemap reference. It seems
> nouveau_dmem_convert_pfn() happens after:
>
> mutex_lock(>mutex);
> if (!nouveau_range_done()) {
> 
> ...so I would expect that to be functionally equivalent to validating
> the reference count.

Yes, OK, that makes sense, I was mostly surprised by the statement the
driver doesn't touch the struct page.. 

I suppose "doesn't touch the struct page out of the driver lock" is
the case.

However, this means we cannot do any processing of ZONE_DEVICE pages
outside the driver lock, so eg, doing any DMA map that might rely on
MEMORY_DEVICE_PCI_P2PDMA has to be done in the driver lock, which is
a bit unfortunate.

Jason
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [PATCH] PCI: Use pci_reset_bus() in quirk_reset_lenovo_thinkpad_50_nvgpu()

2019-08-15 Thread Bjorn Helgaas
On Thu, Aug 01, 2019 at 06:01:17PM -0400, Lyude Paul wrote:
> Since quirk_nvidia_hda() was added there's now two nvidia device
> functions on any laptops with nvidia GPUs: the HDA controller, and the
> GPU itself. Unfortunately this has the sideaffect of breaking
> quirk_reset_lenovo_thinkpad_50_nvgpu() since pci_reset_function() was
> using pci_parent_bus_reset() to reset the GPU's respective PCI bus, and
> pci_parent_bus_reset() does not work on busses which have more then a
> single device function present.
> 
> So, fix this by simply calling pci_reset_bus() instead which properly
> resets the GPU bus and all device functions under it, including both the
> GPU and the HDA controller.
> 
> Fixes: b516ea586d71 ("PCI: Enable NVIDIA HDA controllers")
> Cc: Lukas Wunner 
> Cc: Daniel Drake 
> Cc: Bjorn Helgaas 
> Cc: Aaron Plattner 
> Cc: Peter Wu 
> Cc: Ilia Mirkin 
> Cc: Karol Herbst 
> Cc: Maik Freudenberg 
> Cc: linux-...@vger.kernel.org
> Signed-off-by: Lyude Paul 

We merged b516ea586d71 for v5.3, so I applied this with Ben's ack to
for-linus for v5.3, thanks!

> ---
>  drivers/pci/quirks.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 208aacf39329..44c4ae1abd00 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5256,7 +5256,7 @@ static void 
> quirk_reset_lenovo_thinkpad_p50_nvgpu(struct pci_dev *pdev)
>*/
>   if (ioread32(map + 0x2240c) & 0x2) {
>   pci_info(pdev, FW_BUG "GPU left initialized by EFI, 
> resetting\n");
> - ret = pci_reset_function(pdev);
> + ret = pci_reset_bus(pdev);
>   if (ret < 0)
>   pci_err(pdev, "Failed to reset GPU: %d\n", ret);
>   }
> -- 
> 2.21.0
> 


Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Jerome Glisse
On Thu, Aug 15, 2019 at 08:41:33PM +, Jason Gunthorpe wrote:
> On Thu, Aug 15, 2019 at 04:33:06PM -0400, Jerome Glisse wrote:
> 
> > So nor HMM nor driver should dereference the struct page (i do not
> > think any iommu driver would either),
> 
> Er, they do technically deref the struct page:
> 
> nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
>struct hmm_range *range)
>   struct page *page;
>   page = hmm_pfn_to_page(range, range->pfns[i]);
>   if (!nouveau_dmem_page(drm, page)) {
> 
> 
> nouveau_dmem_page(struct nouveau_drm *drm, struct page *page)
> {
>   return is_device_private_page(page) && drm->dmem == page_to_dmem(page)
> 
> 
> Which does touch 'page->pgmap'
> 
> Is this OK without having a get_dev_pagemap() ?
>
> Noting that the collision-retry scheme doesn't protect anything here
> as we can have a concurrent invalidation while doing the above deref.

Uh ? How so ? We are not reading the same code i think.

My read is that function is call when holding the device
lock which exclude any racing mmu notifier from making
forward progress and it is also protected by the range so
at the time this happens it is safe to dereference the
struct page. In this case any way we can update the
nouveau_dmem_page() to check that page page->pgmap == the
expected pgmap.

Cheers,
Jérôme


Re: [Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Dan Williams
On Thu, Aug 15, 2019 at 1:41 PM Jason Gunthorpe  wrote:
>
> On Thu, Aug 15, 2019 at 04:33:06PM -0400, Jerome Glisse wrote:
>
> > So nor HMM nor driver should dereference the struct page (i do not
> > think any iommu driver would either),
>
> Er, they do technically deref the struct page:
>
> nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
>  struct hmm_range *range)
> struct page *page;
> page = hmm_pfn_to_page(range, range->pfns[i]);
> if (!nouveau_dmem_page(drm, page)) {
>
>
> nouveau_dmem_page(struct nouveau_drm *drm, struct page *page)
> {
> return is_device_private_page(page) && drm->dmem == page_to_dmem(page)
>
>
> Which does touch 'page->pgmap'
>
> Is this OK without having a get_dev_pagemap() ?
>
> Noting that the collision-retry scheme doesn't protect anything here
> as we can have a concurrent invalidation while doing the above deref.

As long take_driver_page_table_lock() in Jerome's flow can replace
percpu_ref_tryget_live() on the pagemap reference. It seems
nouveau_dmem_convert_pfn() happens after:

mutex_lock(>mutex);
if (!nouveau_range_done()) {

...so I would expect that to be functionally equivalent to validating
the reference count.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Jason Gunthorpe
On Thu, Aug 15, 2019 at 04:33:06PM -0400, Jerome Glisse wrote:

> So nor HMM nor driver should dereference the struct page (i do not
> think any iommu driver would either),

Er, they do technically deref the struct page:

nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
 struct hmm_range *range)
struct page *page;
page = hmm_pfn_to_page(range, range->pfns[i]);
if (!nouveau_dmem_page(drm, page)) {


nouveau_dmem_page(struct nouveau_drm *drm, struct page *page)
{
return is_device_private_page(page) && drm->dmem == page_to_dmem(page)


Which does touch 'page->pgmap'

Is this OK without having a get_dev_pagemap() ?

Noting that the collision-retry scheme doesn't protect anything here
as we can have a concurrent invalidation while doing the above deref.

Jason


Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Jerome Glisse
On Thu, Aug 15, 2019 at 01:12:22PM -0700, Dan Williams wrote:
> On Thu, Aug 15, 2019 at 12:44 PM Jerome Glisse  wrote:
> >
> > On Thu, Aug 15, 2019 at 12:36:58PM -0700, Dan Williams wrote:
> > > On Thu, Aug 15, 2019 at 11:07 AM Jerome Glisse  wrote:
> > > >
> > > > On Wed, Aug 14, 2019 at 07:48:28AM -0700, Dan Williams wrote:
> > > > > On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph Hellwig wrote:
> > > > > > > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> > > > > > > > Section alignment constraints somewhat save us here. The only 
> > > > > > > > example
> > > > > > > > I can think of a PMD not containing a uniform pgmap association 
> > > > > > > > for
> > > > > > > > each pte is the case when the pgmap overlaps normal dram, i.e. 
> > > > > > > > shares
> > > > > > > > the same 'struct memory_section' for a given span. Otherwise, 
> > > > > > > > distinct
> > > > > > > > pgmaps arrange to manage their own exclusive sections (and now
> > > > > > > > subsections as of v5.3). Otherwise the implementation could not
> > > > > > > > guarantee different mapping lifetimes.
> > > > > > > >
> > > > > > > > That said, this seems to want a better mechanism to determine 
> > > > > > > > "pfn is
> > > > > > > > ZONE_DEVICE".
> > > > > > >
> > > > > > > So I guess this patch is fine for now, and once you provide a 
> > > > > > > better
> > > > > > > mechanism we can switch over to it?
> > > > > >
> > > > > > What about the version I sent to just get rid of all the strange
> > > > > > put_dev_pagemaps while scanning? Odds are good we will work with 
> > > > > > only
> > > > > > a single pagemap, so it makes some sense to cache it once we find 
> > > > > > it?
> > > > >
> > > > > Yes, if the scan is over a single pmd then caching it makes sense.
> > > >
> > > > Quite frankly an easier an better solution is to remove the pagemap
> > > > lookup as HMM user abide by mmu notifier it means we will not make
> > > > use or dereference the struct page so that we are safe from any
> > > > racing hotunplug of dax memory (as long as device driver using hmm
> > > > do not have a bug).
> > >
> > > Yes, as long as the driver remove is synchronized against HMM
> > > operations via another mechanism then there is no need to take pagemap
> > > references. Can you briefly describe what that other mechanism is?
> >
> > So if you hotunplug some dax memory i assume that this can only
> > happens once all the pages are unmapped (as it must have the
> > zero refcount, well 1 because of the bias) and any unmap will
> > trigger a mmu notifier callback. User of hmm mirror abiding by
> > the API will never make use of information they get through the
> > fault or snapshot function until checking for racing notifier
> > under lock.
> 
> Hmm that first assumption is not guaranteed by the dev_pagemap core.
> The dev_pagemap end of life model is "disable, invalidate, drain" so
> it's possible to call devm_munmap_pages() while pages are still mapped
> it just won't complete the teardown of the pagemap until the last
> reference is dropped. New references are blocked during this teardown.
> 
> However, if the driver is validating the liveness of the mapping in
> the mmu-notifier path and blocking new references it sounds like it
> should be ok. Might there be GPU driver unit tests that cover this
> racing teardown case?

So nor HMM nor driver should dereference the struct page (i do not
think any iommu driver would either), they only care about the pfn.
So even if we race with a teardown as soon as we get the mmu notifier
callback to invalidate the mmapping we will do so. The pattern is:

mydriver_populate_vaddr_range(start, end) {
hmm_range_register(range, start, end)
again:
ret = hmm_range_fault(start, end)
if (ret < 0)
return ret;

take_driver_page_table_lock();
if (range.valid) {
populate_device_page_table();
release_driver_page_table_lock();
} else {
release_driver_page_table_lock();
goto again;
}
}

The mmu notifier callback do use the same page table lock and we
also have the range tracking going on. So either we populate
device page table before racing with teardown in which case the
device page table entry are clear through the mmu notifier call
back. Or if we race, but then we can see the racing mmu notifier
calls and retry again which will trigger a regular page fault
which will return an error i assume.

So in the end we have the exact same behavior as if a CPU was trying
to access that virtual address. This is the whole point of HMM, to
behave exactly as if it was a CPU access. Fails in the same way,
race in the same way. So if DAX teardown are safe versus racing
CPU access to some vma that have that memory map, it will be the
same for HMM users.


GPU driver test suite are not good at testing this. 

Re: [Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Dan Williams
On Thu, Aug 15, 2019 at 12:44 PM Jerome Glisse  wrote:
>
> On Thu, Aug 15, 2019 at 12:36:58PM -0700, Dan Williams wrote:
> > On Thu, Aug 15, 2019 at 11:07 AM Jerome Glisse  wrote:
> > >
> > > On Wed, Aug 14, 2019 at 07:48:28AM -0700, Dan Williams wrote:
> > > > On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe  
> > > > wrote:
> > > > >
> > > > > On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph Hellwig wrote:
> > > > > > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> > > > > > > Section alignment constraints somewhat save us here. The only 
> > > > > > > example
> > > > > > > I can think of a PMD not containing a uniform pgmap association 
> > > > > > > for
> > > > > > > each pte is the case when the pgmap overlaps normal dram, i.e. 
> > > > > > > shares
> > > > > > > the same 'struct memory_section' for a given span. Otherwise, 
> > > > > > > distinct
> > > > > > > pgmaps arrange to manage their own exclusive sections (and now
> > > > > > > subsections as of v5.3). Otherwise the implementation could not
> > > > > > > guarantee different mapping lifetimes.
> > > > > > >
> > > > > > > That said, this seems to want a better mechanism to determine 
> > > > > > > "pfn is
> > > > > > > ZONE_DEVICE".
> > > > > >
> > > > > > So I guess this patch is fine for now, and once you provide a better
> > > > > > mechanism we can switch over to it?
> > > > >
> > > > > What about the version I sent to just get rid of all the strange
> > > > > put_dev_pagemaps while scanning? Odds are good we will work with only
> > > > > a single pagemap, so it makes some sense to cache it once we find it?
> > > >
> > > > Yes, if the scan is over a single pmd then caching it makes sense.
> > >
> > > Quite frankly an easier an better solution is to remove the pagemap
> > > lookup as HMM user abide by mmu notifier it means we will not make
> > > use or dereference the struct page so that we are safe from any
> > > racing hotunplug of dax memory (as long as device driver using hmm
> > > do not have a bug).
> >
> > Yes, as long as the driver remove is synchronized against HMM
> > operations via another mechanism then there is no need to take pagemap
> > references. Can you briefly describe what that other mechanism is?
>
> So if you hotunplug some dax memory i assume that this can only
> happens once all the pages are unmapped (as it must have the
> zero refcount, well 1 because of the bias) and any unmap will
> trigger a mmu notifier callback. User of hmm mirror abiding by
> the API will never make use of information they get through the
> fault or snapshot function until checking for racing notifier
> under lock.

Hmm that first assumption is not guaranteed by the dev_pagemap core.
The dev_pagemap end of life model is "disable, invalidate, drain" so
it's possible to call devm_munmap_pages() while pages are still mapped
it just won't complete the teardown of the pagemap until the last
reference is dropped. New references are blocked during this teardown.

However, if the driver is validating the liveness of the mapping in
the mmu-notifier path and blocking new references it sounds like it
should be ok. Might there be GPU driver unit tests that cover this
racing teardown case?
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Jerome Glisse
On Thu, Aug 15, 2019 at 12:36:58PM -0700, Dan Williams wrote:
> On Thu, Aug 15, 2019 at 11:07 AM Jerome Glisse  wrote:
> >
> > On Wed, Aug 14, 2019 at 07:48:28AM -0700, Dan Williams wrote:
> > > On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe  wrote:
> > > >
> > > > On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph Hellwig wrote:
> > > > > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> > > > > > Section alignment constraints somewhat save us here. The only 
> > > > > > example
> > > > > > I can think of a PMD not containing a uniform pgmap association for
> > > > > > each pte is the case when the pgmap overlaps normal dram, i.e. 
> > > > > > shares
> > > > > > the same 'struct memory_section' for a given span. Otherwise, 
> > > > > > distinct
> > > > > > pgmaps arrange to manage their own exclusive sections (and now
> > > > > > subsections as of v5.3). Otherwise the implementation could not
> > > > > > guarantee different mapping lifetimes.
> > > > > >
> > > > > > That said, this seems to want a better mechanism to determine "pfn 
> > > > > > is
> > > > > > ZONE_DEVICE".
> > > > >
> > > > > So I guess this patch is fine for now, and once you provide a better
> > > > > mechanism we can switch over to it?
> > > >
> > > > What about the version I sent to just get rid of all the strange
> > > > put_dev_pagemaps while scanning? Odds are good we will work with only
> > > > a single pagemap, so it makes some sense to cache it once we find it?
> > >
> > > Yes, if the scan is over a single pmd then caching it makes sense.
> >
> > Quite frankly an easier an better solution is to remove the pagemap
> > lookup as HMM user abide by mmu notifier it means we will not make
> > use or dereference the struct page so that we are safe from any
> > racing hotunplug of dax memory (as long as device driver using hmm
> > do not have a bug).
> 
> Yes, as long as the driver remove is synchronized against HMM
> operations via another mechanism then there is no need to take pagemap
> references. Can you briefly describe what that other mechanism is?

So if you hotunplug some dax memory i assume that this can only
happens once all the pages are unmapped (as it must have the
zero refcount, well 1 because of the bias) and any unmap will
trigger a mmu notifier callback. User of hmm mirror abiding by
the API will never make use of information they get through the
fault or snapshot function until checking for racing notifier
under lock.

Cheers,
Jérôme


Re: [Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Dan Williams
On Thu, Aug 15, 2019 at 11:07 AM Jerome Glisse  wrote:
>
> On Wed, Aug 14, 2019 at 07:48:28AM -0700, Dan Williams wrote:
> > On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe  wrote:
> > >
> > > On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph Hellwig wrote:
> > > > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> > > > > Section alignment constraints somewhat save us here. The only example
> > > > > I can think of a PMD not containing a uniform pgmap association for
> > > > > each pte is the case when the pgmap overlaps normal dram, i.e. shares
> > > > > the same 'struct memory_section' for a given span. Otherwise, distinct
> > > > > pgmaps arrange to manage their own exclusive sections (and now
> > > > > subsections as of v5.3). Otherwise the implementation could not
> > > > > guarantee different mapping lifetimes.
> > > > >
> > > > > That said, this seems to want a better mechanism to determine "pfn is
> > > > > ZONE_DEVICE".
> > > >
> > > > So I guess this patch is fine for now, and once you provide a better
> > > > mechanism we can switch over to it?
> > >
> > > What about the version I sent to just get rid of all the strange
> > > put_dev_pagemaps while scanning? Odds are good we will work with only
> > > a single pagemap, so it makes some sense to cache it once we find it?
> >
> > Yes, if the scan is over a single pmd then caching it makes sense.
>
> Quite frankly an easier an better solution is to remove the pagemap
> lookup as HMM user abide by mmu notifier it means we will not make
> use or dereference the struct page so that we are safe from any
> racing hotunplug of dax memory (as long as device driver using hmm
> do not have a bug).

Yes, as long as the driver remove is synchronized against HMM
operations via another mechanism then there is no need to take pagemap
references. Can you briefly describe what that other mechanism is?
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Jason Gunthorpe
On Thu, Aug 15, 2019 at 02:03:25PM -0400, Jerome Glisse wrote:
> On Wed, Aug 14, 2019 at 07:48:28AM -0700, Dan Williams wrote:
> > On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe  wrote:
> > >
> > > On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph Hellwig wrote:
> > > > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> > > > > Section alignment constraints somewhat save us here. The only example
> > > > > I can think of a PMD not containing a uniform pgmap association for
> > > > > each pte is the case when the pgmap overlaps normal dram, i.e. shares
> > > > > the same 'struct memory_section' for a given span. Otherwise, distinct
> > > > > pgmaps arrange to manage their own exclusive sections (and now
> > > > > subsections as of v5.3). Otherwise the implementation could not
> > > > > guarantee different mapping lifetimes.
> > > > >
> > > > > That said, this seems to want a better mechanism to determine "pfn is
> > > > > ZONE_DEVICE".
> > > >
> > > > So I guess this patch is fine for now, and once you provide a better
> > > > mechanism we can switch over to it?
> > >
> > > What about the version I sent to just get rid of all the strange
> > > put_dev_pagemaps while scanning? Odds are good we will work with only
> > > a single pagemap, so it makes some sense to cache it once we find it?
> > 
> > Yes, if the scan is over a single pmd then caching it makes sense.
> 
> Quite frankly an easier an better solution is to remove the pagemap
> lookup as HMM user abide by mmu notifier it means we will not make
> use or dereference the struct page so that we are safe from any
> racing hotunplug of dax memory (as long as device driver using hmm
> do not have a bug).

Yes, I also would prefer to drop the confusing checks entirely -
Christoph can you resend this patch?

Thanks,
Jason 
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Alex Deucher
On Thu, Aug 15, 2019 at 12:19 PM  wrote:
>
> > -Original Message-
> > From: Takashi Iwai 
> > Sent: Thursday, August 15, 2019 9:57 AM
> > To: Alex Deucher
> > Cc: Karol Herbst; Limonciello, Mario; nouveau; Rafael J . Wysocki; LKML; 
> > dri-devel;
> > Linux ACPI Mailing List; Alex Hung; Ben Skeggs; David Airlie
> > Subject: Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string 
> > to
> > enable dGPU direct output"
> >
> >
> > [EXTERNAL EMAIL]
> >
> > On Thu, 15 Aug 2019 16:37:05 +0200,
> > Alex Deucher wrote:
> > >
> > > On Thu, Aug 15, 2019 at 10:25 AM Karol Herbst  wrote:
> > > >
> > > > On Thu, Aug 15, 2019 at 4:20 PM  wrote:
> > > > >
> > > > > > > There are definitely going to be regressions on machines in the 
> > > > > > > field
> > with the
> > > > > > > in tree drivers by reverting this.  I think we should have an 
> > > > > > > answer for all
> > of
> > > > > > those
> > > > > > > before this revert is accepted.
> > > > > > >
> > > > > > > Regarding systems with Intel+NVIDIA, we'll have to work with 
> > > > > > > partners
> > to
> > > > > > collect
> > > > > > > some information on the impact of reverting this.
> > > > > > >
> > > > > > > When this is used on a system with Intel+AMD the ASL configures 
> > > > > > > AMD
> > GPU to
> > > > > > use
> > > > > > > "Hybrid Graphics" when on Windows and "Power Express" and
> > "Switchable
> > > > > > Graphics"
> > > > > > > when on Linux.
> > > > > >
> > > > > > and what's exactly the difference between those? And what's the 
> > > > > > actual
> > > > > > issue here?
> > > > >
> > > > > DP/HDMI is not detected unless plugged in at bootup.  It's due to 
> > > > > missing
> > HPD
> > > > > events.
> > > > >
> > > >
> > > > afaik Lyude was working on fixing all that, at least for some drivers.
> > > > If there is something wrong, we still should fix the drivers, not
> > > > adding ACPI workarounds.
> > > >
> > > > Alex: do you know if there are remaining issues regarding that with 
> > > > amdgpu?
> > >
> > > There was an issue with hpd events not making it to the audio side
> > > when things were powered down that was fixed with this patch set:
> > > https://patchwork.freedesktop.org/patch/316793/
> > > Those patches depended on a bunch of alsa changes as well which may
> > > have not been available in the distro used for a particular OEM
> > > program.
> >
> > FYI, the corresponding commit for ALSA part is destined for 5.4
> > kernel:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?id=ade
> > 49db337a9d44ac5835cfce1ee873549011b27
> >
> > BTW, Nouveau should suffer from the same problem.  The patch to add
> > the audio component support is found at:
> >   https://patchwork.freedesktop.org/patch/319131/
> >
> >
>
> It sounds like 5.3rcX won't be a useful check then.
>
> So am I correct to understand that everything related to the AMD failures
> described in this thread should be in linux-next at this point?
>

Assuming you mean the missing audio hotplug events, then yes.

Alex
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [RFC] drm: Bump encoder limit from 32 to 64

2019-08-15 Thread Daniel Vetter
On Thu, Aug 15, 2019 at 8:03 PM Lyude Paul  wrote:
>
> Assuming that GPUs would never have even close to 32 separate video
> encoders is quite honestly a pretty reasonable assumption. Unfortunately
> we do not live in a reasonable world, as it looks like it is actually
> possible to find devices that will create more drm_encoder objects then
> this. Case in point: the ThinkPad P71's discrete GPU, which exposes 1
> eDP port and 5 DP ports. On the P71, nouveau attempts to create one
> encoder for the eDP port, and two encoders for each DP++/USB-C port
> along with 4 MST encoders for each DP port. This comes out to 35
> different encoders. Unfortunately, this can't really be optimized to
> make less encoders either.
>
> So, what if we bumped the limit to 64? Unfortunately this has one very
> awkward drawback: we already expose 32-bit bitmasks for encoders to
> userspace in drm_encoder->possible_clones. Yikes. Luckily for us
> however, the year is 2019 and modern hardware that supports cloning is
> basically non-existent.
>
> So, let's try to compromise here: allow encoders with indexes <32 to
> have non-zero values in drm_encoder->possible_clones, and don't allow
> encoders with higher indexes to set drm_encoder->possible_clones to a
> non-zero value. This allows us to avoid breaking UAPI, while still being
> able to bump up the encoder limit.
>
> This also fixes driver probing for nouveau on the ThinkPad P71.
>
> Signed-off-by: Lyude Paul 
> Cc: nouveau@lists.freedesktop.org

We should never have exposed encoders, that was such a mistake ...

Does nouveau really still use encoders as datastructures internally
for modeset? Otherwise could we perhaps just expose a set of fake
encoders (one per connector) and give up on all these tricks?

Maybe even as a drm level generic thing, i.e. at drm_register_time we
check whether there's any encoders with ->possible_clones set (the
only real information in an encoder, and fairly useless on modern hw
like you point out). And if that's the case we hide all the "real"
encoders from userspace and put up a pile of fake ones. Which do
nothing else than keep userspace happy. Thoughts?

> ---
>  drivers/gpu/drm/drm_atomic.c  |  2 +-
>  drivers/gpu/drm/drm_encoder.c | 10 --
>  include/drm/drm_crtc.h|  2 +-
>  include/drm/drm_encoder.h | 20 +++-
>  4 files changed, 25 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
> index 419381abbdd1..27ce988ef0cc 100644
> --- a/drivers/gpu/drm/drm_atomic.c
> +++ b/drivers/gpu/drm/drm_atomic.c
> @@ -392,7 +392,7 @@ static void drm_atomic_crtc_print_state(struct 
> drm_printer *p,
> drm_printf(p, "\tcolor_mgmt_changed=%d\n", state->color_mgmt_changed);
> drm_printf(p, "\tplane_mask=%x\n", state->plane_mask);
> drm_printf(p, "\tconnector_mask=%x\n", state->connector_mask);
> -   drm_printf(p, "\tencoder_mask=%x\n", state->encoder_mask);
> +   drm_printf(p, "\tencoder_mask=%llx\n", state->encoder_mask);
> drm_printf(p, "\tmode: " DRM_MODE_FMT "\n", 
> DRM_MODE_ARG(>mode));
>
> if (crtc->funcs->atomic_print_state)
> diff --git a/drivers/gpu/drm/drm_encoder.c b/drivers/gpu/drm/drm_encoder.c
> index 7fb47b7b8b44..e4b8f675aa81 100644
> --- a/drivers/gpu/drm/drm_encoder.c
> +++ b/drivers/gpu/drm/drm_encoder.c
> @@ -112,8 +112,14 @@ int drm_encoder_init(struct drm_device *dev,
>  {
> int ret;
>
> -   /* encoder index is used with 32bit bitmasks */
> -   if (WARN_ON(dev->mode_config.num_encoder >= 32))
> +   /*
> +* Since possible_clones has been exposed to userspace as a 32bit
> +* bitmask, we don't allow creating encoders with an index >=32 which
> +* are capable of cloning.
> +*/
> +   if (WARN_ON(dev->mode_config.num_encoder >= 64) ||
> +   WARN_ON(dev->mode_config.num_encoder >= 32 &&
> +   encoder->possible_clones))

This is too early, drivers are free to change ->possible_clones later
on. We've have to recheck that at drm_dev_register() time I think.
-Daniel

> return -EINVAL;
>
> ret = drm_mode_object_add(dev, >base, 
> DRM_MODE_OBJECT_ENCODER);
> diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h
> index 7d14c11bdc0a..fd0b2438c3d5 100644
> --- a/include/drm/drm_crtc.h
> +++ b/include/drm/drm_crtc.h
> @@ -210,7 +210,7 @@ struct drm_crtc_state {
>  * @encoder_mask: Bitmask of drm_encoder_mask(encoder) of encoders
>  * attached to this CRTC.
>  */
> -   u32 encoder_mask;
> +   u64 encoder_mask;
>
> /**
>  * @adjusted_mode:
> diff --git a/include/drm/drm_encoder.h b/include/drm/drm_encoder.h
> index 70cfca03d812..3f9cb65694e1 100644
> --- a/include/drm/drm_encoder.h
> +++ b/include/drm/drm_encoder.h
> @@ -159,7 +159,15 @@ struct drm_encoder {
>  * encoders can be used in a cloned configuration, they both should 
> have
>  * each 

Re: [PATCH v7 1/9] drm_dp_cec: add connector info support.

2019-08-15 Thread Lyude Paul
Reviewed-by: Lyude Paul 

On Wed, 2019-08-14 at 12:44 +0200, Dariusz Marcinkiewicz wrote:
> Pass the connector info to the CEC adapter. This makes it possible
> to associate the CEC adapter with the corresponding drm connector.
> 
> Signed-off-by: Dariusz Marcinkiewicz 
> Signed-off-by: Hans Verkuil 
> Tested-by: Hans Verkuil 
> ---
>  .../display/amdgpu_dm/amdgpu_dm_mst_types.c   |  2 +-
>  drivers/gpu/drm/drm_dp_cec.c  | 25 ---
>  drivers/gpu/drm/i915/display/intel_dp.c   |  4 +--
>  drivers/gpu/drm/nouveau/nouveau_connector.c   |  3 +--
>  include/drm/drm_dp_helper.h   | 17 ++---
>  5 files changed, 27 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> index 16218a202b591..5ec14efd4d8cb 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> @@ -416,7 +416,7 @@ void amdgpu_dm_initialize_dp_connector(struct
> amdgpu_display_manager *dm,
>  
>   drm_dp_aux_register(>dm_dp_aux.aux);
>   drm_dp_cec_register_connector(>dm_dp_aux.aux,
> -   aconnector->base.name, dm->adev->dev);
> +   >base);
>   aconnector->mst_mgr.cbs = _mst_cbs;
>   drm_dp_mst_topology_mgr_init(
>   >mst_mgr,
> diff --git a/drivers/gpu/drm/drm_dp_cec.c b/drivers/gpu/drm/drm_dp_cec.c
> index b15cee85b702b..b457c16c3a8bb 100644
> --- a/drivers/gpu/drm/drm_dp_cec.c
> +++ b/drivers/gpu/drm/drm_dp_cec.c
> @@ -8,7 +8,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> +#include 
>  #include 
>  
>  /*
> @@ -295,7 +297,10 @@ static void drm_dp_cec_unregister_work(struct work_struct
> *work)
>   */
>  void drm_dp_cec_set_edid(struct drm_dp_aux *aux, const struct edid *edid)
>  {
> - u32 cec_caps = CEC_CAP_DEFAULTS | CEC_CAP_NEEDS_HPD;
> + struct drm_connector *connector = aux->cec.connector;
> + u32 cec_caps = CEC_CAP_DEFAULTS | CEC_CAP_NEEDS_HPD |
> +CEC_CAP_CONNECTOR_INFO;
> + struct cec_connector_info conn_info;
>   unsigned int num_las = 1;
>   u8 cap;
>  
> @@ -344,13 +349,17 @@ void drm_dp_cec_set_edid(struct drm_dp_aux *aux, const
> struct edid *edid)
>  
>   /* Create a new adapter */
>   aux->cec.adap = cec_allocate_adapter(_dp_cec_adap_ops,
> -  aux, aux->cec.name, cec_caps,
> +  aux, connector->name, cec_caps,
>num_las);
>   if (IS_ERR(aux->cec.adap)) {
>   aux->cec.adap = NULL;
>   goto unlock;
>   }
> - if (cec_register_adapter(aux->cec.adap, aux->cec.parent)) {
> +
> + cec_fill_conn_info_from_drm(_info, connector);
> + cec_s_conn_info(aux->cec.adap, _info);
> +
> + if (cec_register_adapter(aux->cec.adap, connector->dev->dev)) {
>   cec_delete_adapter(aux->cec.adap);
>   aux->cec.adap = NULL;
>   } else {
> @@ -406,22 +415,20 @@ EXPORT_SYMBOL(drm_dp_cec_unset_edid);
>  /**
>   * drm_dp_cec_register_connector() - register a new connector
>   * @aux: DisplayPort AUX channel
> - * @name: name of the CEC device
> - * @parent: parent device
> + * @connector: drm connector
>   *
>   * A new connector was registered with associated CEC adapter name and
>   * CEC adapter parent device. After registering the name and parent
>   * drm_dp_cec_set_edid() is called to check if the connector supports
>   * CEC and to register a CEC adapter if that is the case.
>   */
> -void drm_dp_cec_register_connector(struct drm_dp_aux *aux, const char *name,
> -struct device *parent)
> +void drm_dp_cec_register_connector(struct drm_dp_aux *aux,
> +struct drm_connector *connector)
>  {
>   WARN_ON(aux->cec.adap);
>   if (WARN_ON(!aux->transfer))
>   return;
> - aux->cec.name = name;
> - aux->cec.parent = parent;
> + aux->cec.connector = connector;
>   INIT_DELAYED_WORK(>cec.unregister_work,
> drm_dp_cec_unregister_work);
>  }
> diff --git a/drivers/gpu/drm/i915/display/intel_dp.c
> b/drivers/gpu/drm/i915/display/intel_dp.c
> index 1092499115760..de2486fe7bf2d 100644
> --- a/drivers/gpu/drm/i915/display/intel_dp.c
> +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> @@ -5497,7 +5497,6 @@ static int
>  intel_dp_connector_register(struct drm_connector *connector)
>  {
>   struct intel_dp *intel_dp = intel_attached_dp(connector);
> - struct drm_device *dev = connector->dev;
>   int ret;
>  
>   ret = intel_connector_register(connector);
> @@ -5512,8 +5511,7 @@ intel_dp_connector_register(struct drm_connector
> *connector)
>   intel_dp->aux.dev = connector->kdev;
>   ret = 

[Nouveau] [RFC] drm: Bump encoder limit from 32 to 64

2019-08-15 Thread Lyude Paul
Assuming that GPUs would never have even close to 32 separate video
encoders is quite honestly a pretty reasonable assumption. Unfortunately
we do not live in a reasonable world, as it looks like it is actually
possible to find devices that will create more drm_encoder objects then
this. Case in point: the ThinkPad P71's discrete GPU, which exposes 1
eDP port and 5 DP ports. On the P71, nouveau attempts to create one
encoder for the eDP port, and two encoders for each DP++/USB-C port
along with 4 MST encoders for each DP port. This comes out to 35
different encoders. Unfortunately, this can't really be optimized to
make less encoders either.

So, what if we bumped the limit to 64? Unfortunately this has one very
awkward drawback: we already expose 32-bit bitmasks for encoders to
userspace in drm_encoder->possible_clones. Yikes. Luckily for us
however, the year is 2019 and modern hardware that supports cloning is
basically non-existent.

So, let's try to compromise here: allow encoders with indexes <32 to
have non-zero values in drm_encoder->possible_clones, and don't allow
encoders with higher indexes to set drm_encoder->possible_clones to a
non-zero value. This allows us to avoid breaking UAPI, while still being
able to bump up the encoder limit.

This also fixes driver probing for nouveau on the ThinkPad P71.

Signed-off-by: Lyude Paul 
Cc: nouveau@lists.freedesktop.org
---
 drivers/gpu/drm/drm_atomic.c  |  2 +-
 drivers/gpu/drm/drm_encoder.c | 10 --
 include/drm/drm_crtc.h|  2 +-
 include/drm/drm_encoder.h | 20 +++-
 4 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 419381abbdd1..27ce988ef0cc 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -392,7 +392,7 @@ static void drm_atomic_crtc_print_state(struct drm_printer 
*p,
drm_printf(p, "\tcolor_mgmt_changed=%d\n", state->color_mgmt_changed);
drm_printf(p, "\tplane_mask=%x\n", state->plane_mask);
drm_printf(p, "\tconnector_mask=%x\n", state->connector_mask);
-   drm_printf(p, "\tencoder_mask=%x\n", state->encoder_mask);
+   drm_printf(p, "\tencoder_mask=%llx\n", state->encoder_mask);
drm_printf(p, "\tmode: " DRM_MODE_FMT "\n", DRM_MODE_ARG(>mode));
 
if (crtc->funcs->atomic_print_state)
diff --git a/drivers/gpu/drm/drm_encoder.c b/drivers/gpu/drm/drm_encoder.c
index 7fb47b7b8b44..e4b8f675aa81 100644
--- a/drivers/gpu/drm/drm_encoder.c
+++ b/drivers/gpu/drm/drm_encoder.c
@@ -112,8 +112,14 @@ int drm_encoder_init(struct drm_device *dev,
 {
int ret;
 
-   /* encoder index is used with 32bit bitmasks */
-   if (WARN_ON(dev->mode_config.num_encoder >= 32))
+   /*
+* Since possible_clones has been exposed to userspace as a 32bit
+* bitmask, we don't allow creating encoders with an index >=32 which
+* are capable of cloning.
+*/
+   if (WARN_ON(dev->mode_config.num_encoder >= 64) ||
+   WARN_ON(dev->mode_config.num_encoder >= 32 &&
+   encoder->possible_clones))
return -EINVAL;
 
ret = drm_mode_object_add(dev, >base, DRM_MODE_OBJECT_ENCODER);
diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h
index 7d14c11bdc0a..fd0b2438c3d5 100644
--- a/include/drm/drm_crtc.h
+++ b/include/drm/drm_crtc.h
@@ -210,7 +210,7 @@ struct drm_crtc_state {
 * @encoder_mask: Bitmask of drm_encoder_mask(encoder) of encoders
 * attached to this CRTC.
 */
-   u32 encoder_mask;
+   u64 encoder_mask;
 
/**
 * @adjusted_mode:
diff --git a/include/drm/drm_encoder.h b/include/drm/drm_encoder.h
index 70cfca03d812..3f9cb65694e1 100644
--- a/include/drm/drm_encoder.h
+++ b/include/drm/drm_encoder.h
@@ -159,7 +159,15 @@ struct drm_encoder {
 * encoders can be used in a cloned configuration, they both should have
 * each another bits set.
 *
-* In reality almost every driver gets this wrong.
+* In reality almost every driver gets this wrong, and most modern
+* display hardware does not have support for cloning. As well, while we
+* expose this mask to userspace as 32bits long, we do sure purely to
+* avoid breaking pre-existing UAPI since the limitation on the number
+* of encoders has been increased from 32 bits to 64 bits. In order to
+* maintain functionality for drivers which do actually support cloning,
+* we only allow cloning with encoders that have an index <32. Encoders
+* with indexes higher than 32 are not allowed to specify a non-zero
+* value here.
 *
 * Note that since encoder objects can't be hotplugged the assigned 
indices
 * are stable and hence known before registering all objects.
@@ -198,13 +206,15 @@ static inline unsigned int drm_encoder_index(const struct 
drm_encoder *encoder)
 }
 
 /**
- * 

Re: [Nouveau] [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk

2019-08-15 Thread Jerome Glisse
On Wed, Aug 14, 2019 at 07:48:28AM -0700, Dan Williams wrote:
> On Wed, Aug 14, 2019 at 6:28 AM Jason Gunthorpe  wrote:
> >
> > On Wed, Aug 14, 2019 at 09:38:54AM +0200, Christoph Hellwig wrote:
> > > On Tue, Aug 13, 2019 at 06:36:33PM -0700, Dan Williams wrote:
> > > > Section alignment constraints somewhat save us here. The only example
> > > > I can think of a PMD not containing a uniform pgmap association for
> > > > each pte is the case when the pgmap overlaps normal dram, i.e. shares
> > > > the same 'struct memory_section' for a given span. Otherwise, distinct
> > > > pgmaps arrange to manage their own exclusive sections (and now
> > > > subsections as of v5.3). Otherwise the implementation could not
> > > > guarantee different mapping lifetimes.
> > > >
> > > > That said, this seems to want a better mechanism to determine "pfn is
> > > > ZONE_DEVICE".
> > >
> > > So I guess this patch is fine for now, and once you provide a better
> > > mechanism we can switch over to it?
> >
> > What about the version I sent to just get rid of all the strange
> > put_dev_pagemaps while scanning? Odds are good we will work with only
> > a single pagemap, so it makes some sense to cache it once we find it?
> 
> Yes, if the scan is over a single pmd then caching it makes sense.

Quite frankly an easier an better solution is to remove the pagemap
lookup as HMM user abide by mmu notifier it means we will not make
use or dereference the struct page so that we are safe from any
racing hotunplug of dax memory (as long as device driver using hmm
do not have a bug).

Cheers,
Jérôme
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Takashi Iwai
On Thu, 15 Aug 2019 18:19:52 +0200,
 wrote:
> 
> > -Original Message-
> > From: Takashi Iwai 
> > Sent: Thursday, August 15, 2019 9:57 AM
> > To: Alex Deucher
> > Cc: Karol Herbst; Limonciello, Mario; nouveau; Rafael J . Wysocki; LKML; 
> > dri-devel;
> > Linux ACPI Mailing List; Alex Hung; Ben Skeggs; David Airlie
> > Subject: Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string 
> > to
> > enable dGPU direct output"
> > 
> > 
> > [EXTERNAL EMAIL]
> > 
> > On Thu, 15 Aug 2019 16:37:05 +0200,
> > Alex Deucher wrote:
> > >
> > > On Thu, Aug 15, 2019 at 10:25 AM Karol Herbst  wrote:
> > > >
> > > > On Thu, Aug 15, 2019 at 4:20 PM  wrote:
> > > > >
> > > > > > > There are definitely going to be regressions on machines in the 
> > > > > > > field
> > with the
> > > > > > > in tree drivers by reverting this.  I think we should have an 
> > > > > > > answer for all
> > of
> > > > > > those
> > > > > > > before this revert is accepted.
> > > > > > >
> > > > > > > Regarding systems with Intel+NVIDIA, we'll have to work with 
> > > > > > > partners
> > to
> > > > > > collect
> > > > > > > some information on the impact of reverting this.
> > > > > > >
> > > > > > > When this is used on a system with Intel+AMD the ASL configures 
> > > > > > > AMD
> > GPU to
> > > > > > use
> > > > > > > "Hybrid Graphics" when on Windows and "Power Express" and
> > "Switchable
> > > > > > Graphics"
> > > > > > > when on Linux.
> > > > > >
> > > > > > and what's exactly the difference between those? And what's the 
> > > > > > actual
> > > > > > issue here?
> > > > >
> > > > > DP/HDMI is not detected unless plugged in at bootup.  It's due to 
> > > > > missing
> > HPD
> > > > > events.
> > > > >
> > > >
> > > > afaik Lyude was working on fixing all that, at least for some drivers.
> > > > If there is something wrong, we still should fix the drivers, not
> > > > adding ACPI workarounds.
> > > >
> > > > Alex: do you know if there are remaining issues regarding that with 
> > > > amdgpu?
> > >
> > > There was an issue with hpd events not making it to the audio side
> > > when things were powered down that was fixed with this patch set:
> > > https://patchwork.freedesktop.org/patch/316793/
> > > Those patches depended on a bunch of alsa changes as well which may
> > > have not been available in the distro used for a particular OEM
> > > program.
> > 
> > FYI, the corresponding commit for ALSA part is destined for 5.4
> > kernel:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?id=ade
> > 49db337a9d44ac5835cfce1ee873549011b27
> > 
> > BTW, Nouveau should suffer from the same problem.  The patch to add
> > the audio component support is found at:
> >   https://patchwork.freedesktop.org/patch/319131/
> > 
> > 
> 
> It sounds like 5.3rcX won't be a useful check then.

For HDMI/DP audio, right, the main pieces are still missing in 5.3.
But there is no regression in this regard, OTOH.

> So am I correct to understand that everything related to the AMD failures
> described in this thread should be in linux-next at this point?

I'm not sure what you're referring as "AMD failures", so leave this
question to AMD people :)


Takashi
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] nouveau/hmm: map pages after migration

2019-08-15 Thread Jerome Glisse
On Tue, Aug 13, 2019 at 05:58:52PM -0400, Jerome Glisse wrote:
> On Wed, Aug 07, 2019 at 08:02:14AM -0700, Ralph Campbell wrote:
> > When memory is migrated to the GPU it is likely to be accessed by GPU
> > code soon afterwards. Instead of waiting for a GPU fault, map the
> > migrated memory into the GPU page tables with the same access permissions
> > as the source CPU page table entries. This preserves copy on write
> > semantics.
> > 
> > Signed-off-by: Ralph Campbell 
> > Cc: Christoph Hellwig 
> > Cc: Jason Gunthorpe 
> > Cc: "Jérôme Glisse" 
> > Cc: Ben Skeggs 
> 
> Sorry for delay i am swamp, couple issues:
> - nouveau_pfns_map() is never call, it should be call after
>   the dma copy is done (iirc it is lacking proper fencing
>   so that would need to be implemented first)
> 
> - the migrate ioctl is disconnected from the svm part and
>   thus we would need first to implement svm reference counting
>   and take a reference at begining of migration process and
>   release it at the end ie struct nouveau_svmm needs refcounting
>   of some sort. I let Ben decides what he likes best for that.

Thinking more about that the svm lifetime is bound to the file
descriptor on the device driver file held by the process. So
when you call migrate ioctl the svm should not go away because
you are in an ioctl against the file descriptor. I need to double
check all that in respect of process that open the device file
multiple time with different file descriptor (or fork thing and
all).


> - i rather not have an extra pfns array, i am pretty sure we
>   can directly feed what we get from the dma array to the svm
>   code to update the GPU page table
> 
> Observation that can be delayed to latter patches:
> - i do not think we want to map directly if the dma engine
>   is queue up and thus if the fence will take time to signal
> 
>   This is why i did not implement this in the first place.
>   Maybe using a workqueue to schedule a pre-fill of the GPU
>   page table and wakeup the workqueue with the fence notify
>   event.
> 
> 
> > ---
> > 
> > This patch is based on top of Christoph Hellwig's 9 patch series
> > https://lore.kernel.org/linux-mm/20190729234611.gc7...@redhat.com/T/#u
> > "turn the hmm migrate_vma upside down" but without patch 9
> > "mm: remove the unused MIGRATE_PFN_WRITE" and adds a use for the flag.
> > 
> > 
> >  drivers/gpu/drm/nouveau/nouveau_dmem.c | 45 +-
> >  drivers/gpu/drm/nouveau/nouveau_svm.c  | 86 ++
> >  drivers/gpu/drm/nouveau/nouveau_svm.h  | 19 ++
> >  3 files changed, 133 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
> > b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> > index ef9de82b0744..c83e6f118817 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> > @@ -25,11 +25,13 @@
> >  #include "nouveau_dma.h"
> >  #include "nouveau_mem.h"
> >  #include "nouveau_bo.h"
> > +#include "nouveau_svm.h"
> >  
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -560,11 +562,12 @@ nouveau_dmem_init(struct nouveau_drm *drm)
> >  }
> >  
> >  static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm,
> > -   struct vm_area_struct *vma, unsigned long addr,
> > -   unsigned long src, dma_addr_t *dma_addr)
> > +   struct vm_area_struct *vma, unsigned long src,
> > +   dma_addr_t *dma_addr, u64 *pfn)
> >  {
> > struct device *dev = drm->dev->dev;
> > struct page *dpage, *spage;
> > +   unsigned long paddr;
> >  
> > spage = migrate_pfn_to_page(src);
> > if (!spage || !(src & MIGRATE_PFN_MIGRATE))
> > @@ -572,17 +575,21 @@ static unsigned long 
> > nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm,
> >  
> > dpage = nouveau_dmem_page_alloc_locked(drm);
> > if (!dpage)
> > -   return 0;
> > +   goto out;
> >  
> > *dma_addr = dma_map_page(dev, spage, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);
> > if (dma_mapping_error(dev, *dma_addr))
> > goto out_free_page;
> >  
> > +   paddr = nouveau_dmem_page_addr(dpage);
> > if (drm->dmem->migrate.copy_func(drm, 1, NOUVEAU_APER_VRAM,
> > -   nouveau_dmem_page_addr(dpage), NOUVEAU_APER_HOST,
> > -   *dma_addr))
> > +   paddr, NOUVEAU_APER_HOST, *dma_addr))
> > goto out_dma_unmap;
> >  
> > +   *pfn = NVIF_VMM_PFNMAP_V0_V | NVIF_VMM_PFNMAP_V0_VRAM |
> > +   ((paddr >> PAGE_SHIFT) << NVIF_VMM_PFNMAP_V0_ADDR_SHIFT);
> > +   if (src & MIGRATE_PFN_WRITE)
> > +   *pfn |= NVIF_VMM_PFNMAP_V0_W;
> > return migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> >  
> >  out_dma_unmap:
> > @@ -590,18 +597,19 @@ static unsigned long 
> > nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm,
> >  out_free_page:
> > 

RE: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Mario.Limonciello
> -Original Message-
> From: Takashi Iwai 
> Sent: Thursday, August 15, 2019 9:57 AM
> To: Alex Deucher
> Cc: Karol Herbst; Limonciello, Mario; nouveau; Rafael J . Wysocki; LKML; 
> dri-devel;
> Linux ACPI Mailing List; Alex Hung; Ben Skeggs; David Airlie
> Subject: Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to
> enable dGPU direct output"
> 
> 
> [EXTERNAL EMAIL]
> 
> On Thu, 15 Aug 2019 16:37:05 +0200,
> Alex Deucher wrote:
> >
> > On Thu, Aug 15, 2019 at 10:25 AM Karol Herbst  wrote:
> > >
> > > On Thu, Aug 15, 2019 at 4:20 PM  wrote:
> > > >
> > > > > > There are definitely going to be regressions on machines in the 
> > > > > > field
> with the
> > > > > > in tree drivers by reverting this.  I think we should have an 
> > > > > > answer for all
> of
> > > > > those
> > > > > > before this revert is accepted.
> > > > > >
> > > > > > Regarding systems with Intel+NVIDIA, we'll have to work with 
> > > > > > partners
> to
> > > > > collect
> > > > > > some information on the impact of reverting this.
> > > > > >
> > > > > > When this is used on a system with Intel+AMD the ASL configures AMD
> GPU to
> > > > > use
> > > > > > "Hybrid Graphics" when on Windows and "Power Express" and
> "Switchable
> > > > > Graphics"
> > > > > > when on Linux.
> > > > >
> > > > > and what's exactly the difference between those? And what's the actual
> > > > > issue here?
> > > >
> > > > DP/HDMI is not detected unless plugged in at bootup.  It's due to 
> > > > missing
> HPD
> > > > events.
> > > >
> > >
> > > afaik Lyude was working on fixing all that, at least for some drivers.
> > > If there is something wrong, we still should fix the drivers, not
> > > adding ACPI workarounds.
> > >
> > > Alex: do you know if there are remaining issues regarding that with 
> > > amdgpu?
> >
> > There was an issue with hpd events not making it to the audio side
> > when things were powered down that was fixed with this patch set:
> > https://patchwork.freedesktop.org/patch/316793/
> > Those patches depended on a bunch of alsa changes as well which may
> > have not been available in the distro used for a particular OEM
> > program.
> 
> FYI, the corresponding commit for ALSA part is destined for 5.4
> kernel:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?id=ade
> 49db337a9d44ac5835cfce1ee873549011b27
> 
> BTW, Nouveau should suffer from the same problem.  The patch to add
> the audio component support is found at:
>   https://patchwork.freedesktop.org/patch/319131/
> 
> 

It sounds like 5.3rcX won't be a useful check then.

So am I correct to understand that everything related to the AMD failures
described in this thread should be in linux-next at this point?



Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Alex Deucher
On Thu, Aug 15, 2019 at 10:37 AM Alex Deucher  wrote:
>
> On Thu, Aug 15, 2019 at 10:25 AM Karol Herbst  wrote:
> >
> > On Thu, Aug 15, 2019 at 4:20 PM  wrote:
> > >
> > > > > There are definitely going to be regressions on machines in the field 
> > > > > with the
> > > > > in tree drivers by reverting this.  I think we should have an answer 
> > > > > for all of
> > > > those
> > > > > before this revert is accepted.
> > > > >
> > > > > Regarding systems with Intel+NVIDIA, we'll have to work with partners 
> > > > > to
> > > > collect
> > > > > some information on the impact of reverting this.
> > > > >
> > > > > When this is used on a system with Intel+AMD the ASL configures AMD 
> > > > > GPU to
> > > > use
> > > > > "Hybrid Graphics" when on Windows and "Power Express" and "Switchable
> > > > Graphics"
> > > > > when on Linux.
> > > >
> > > > and what's exactly the difference between those? And what's the actual
> > > > issue here?
> > >
> > > DP/HDMI is not detected unless plugged in at bootup.  It's due to missing 
> > > HPD
> > > events.
> > >
> >
> > afaik Lyude was working on fixing all that, at least for some drivers.
> > If there is something wrong, we still should fix the drivers, not
> > adding ACPI workarounds.
> >
> > Alex: do you know if there are remaining issues regarding that with amdgpu?
>
> There was an issue with hpd events not making it to the audio side
> when things were powered down that was fixed with this patch set:
> https://patchwork.freedesktop.org/patch/316793/
> Those patches depended on a bunch of alsa changes as well which may
> have not been available in the distro used for a particular OEM
> program.
>
> >
> > > >
> > > > We already have the PRIME offloading in place and if that's not
> > > > enough, we should work on extending it, not adding some ACPI based
> > > > workarounds, because that's exactly how that looks like.
> > > >
> > > > Also, was this discussed with anybody involved in the drm subsystem?
> > > >
> > > > >
> > > > > I feel we need a knob and/or DMI detection to affect the changes that 
> > > > > the ASL
> > > > > normally performs.
> > > >
> > > > Why do we have to do that on a firmware level at all?
> > >
> > > Folks from AMD Graphics team recommended this approach.  From their 
> > > perspective
> > > it's not a workaround.  They view this as a different architecture for 
> > > AMD graphics driver on
> > > Windows and AMD graphics w/ amdgpu driver.  They have different ASL paths 
> > > used for
> > > each.
> >
> > @alex: is this true?
>
> I'm not familiar with this patches in particular, but I know we've
> done things with OEM programs to support Linux on platforms where
> Linux support is lacking for in new features for the target distros.
> E.g., when the first hybrid graphics laptops were coming out, Linux
> didn't support it too well or at all depending on the timing, so the
> bios exposed power express which was working well at the time if the
> OS told ACPI it was Linux.

FWIW, windows does something similar.  I don't think windows 7
supports hybrid graphics either so if the OS tells ACPI it's windows
7, it gets power express instead of hybrid graphics as well.  At least
on laptops that support windows 7 in the first place.

Alex

>
> Alex
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Takashi Iwai
On Thu, 15 Aug 2019 16:37:05 +0200,
Alex Deucher wrote:
> 
> On Thu, Aug 15, 2019 at 10:25 AM Karol Herbst  wrote:
> >
> > On Thu, Aug 15, 2019 at 4:20 PM  wrote:
> > >
> > > > > There are definitely going to be regressions on machines in the field 
> > > > > with the
> > > > > in tree drivers by reverting this.  I think we should have an answer 
> > > > > for all of
> > > > those
> > > > > before this revert is accepted.
> > > > >
> > > > > Regarding systems with Intel+NVIDIA, we'll have to work with partners 
> > > > > to
> > > > collect
> > > > > some information on the impact of reverting this.
> > > > >
> > > > > When this is used on a system with Intel+AMD the ASL configures AMD 
> > > > > GPU to
> > > > use
> > > > > "Hybrid Graphics" when on Windows and "Power Express" and "Switchable
> > > > Graphics"
> > > > > when on Linux.
> > > >
> > > > and what's exactly the difference between those? And what's the actual
> > > > issue here?
> > >
> > > DP/HDMI is not detected unless plugged in at bootup.  It's due to missing 
> > > HPD
> > > events.
> > >
> >
> > afaik Lyude was working on fixing all that, at least for some drivers.
> > If there is something wrong, we still should fix the drivers, not
> > adding ACPI workarounds.
> >
> > Alex: do you know if there are remaining issues regarding that with amdgpu?
> 
> There was an issue with hpd events not making it to the audio side
> when things were powered down that was fixed with this patch set:
> https://patchwork.freedesktop.org/patch/316793/
> Those patches depended on a bunch of alsa changes as well which may
> have not been available in the distro used for a particular OEM
> program.

FYI, the corresponding commit for ALSA part is destined for 5.4
kernel: 
  
https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?id=ade49db337a9d44ac5835cfce1ee873549011b27

BTW, Nouveau should suffer from the same problem.  The patch to add
the audio component support is found at:
  https://patchwork.freedesktop.org/patch/319131/


thanks,

Takashi
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Alex Deucher
On Thu, Aug 15, 2019 at 10:30 AM  wrote:
>
> > On Thu, Aug 15, 2019 at 10:15 AM Karol Herbst  wrote:
> > >
> > > On Thu, Aug 15, 2019 at 4:13 PM Alex Deucher 
> > wrote:
> > > >
> > > > On Thu, Aug 15, 2019 at 10:04 AM Karol Herbst  
> > > > wrote:
> > > > >
> > > > > On Thu, Aug 15, 2019 at 3:56 PM  wrote:
> > > > > >
> > > > > > > -Original Message-
> > > > > > > From: linux-acpi-ow...@vger.kernel.org  > ow...@vger.kernel.org> On
> > > > > > > Behalf Of Dave Airlie
> > > > > > > Sent: Wednesday, August 14, 2019 5:48 PM
> > > > > > > To: Karol Herbst
> > > > > > > Cc: LKML; Linux ACPI; dri-devel; nouveau; Rafael J . Wysocki; 
> > > > > > > Alex Hung;
> > Ben
> > > > > > > Skeggs; Dave Airlie
> > > > > > > Subject: Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM 
> > > > > > > _OSI
> > string to
> > > > > > > enable dGPU direct output"
> > > > > > >
> > > > > > > On Thu, 15 Aug 2019 at 07:31, Karol Herbst 
> > wrote:
> > > > > > > >
> > > > > > > > This reverts commit 28586a51eea666d5531bcaef2f68e4abbd87242c.
> > > > > > > >
> > > > > > > > The original commit message didn't even make sense. AMD _does_
> > support it and
> > > > > > > > it works with Nouveau as well.
> > > > > > > >
> > > > > > > > Also what was the issue being solved here? No references to any 
> > > > > > > > bugs
> > and not
> > > > > > > > even explaining any issue at all isn't the way we do things.
> > > > > > > >
> > > > > > > > And even if it means a muxed design, then the fix is to make it 
> > > > > > > > work
> > inside the
> > > > > > > > driver, not adding some hacky workaround through ACPI tricks.
> > > > > > > >
> > > > > > > > And what out of tree drivers do or do not support we don't care 
> > > > > > > > one
> > bit anyway.
> > > > > > > >
> > > > > > >
> > > > > > > I think the reverts should be merged via Rafael's tree as the 
> > > > > > > original
> > > > > > > patches went in via there, and we should get them in asap.
> > > > > > >
> > > > > > > Acked-by: Dave Airlie 
> > > > > > > Dave.
> > > > > >
> > > > > > There are definitely going to be regressions on machines in the 
> > > > > > field with
> > the
> > > > > > in tree drivers by reverting this.  I think we should have an 
> > > > > > answer for all
> > of those
> > > > > > before this revert is accepted.
> > > > > >
> > > > > > Regarding systems with Intel+NVIDIA, we'll have to work with 
> > > > > > partners to
> > collect
> > > > > > some information on the impact of reverting this.
> > > > > >
> > > > > > When this is used on a system with Intel+AMD the ASL configures AMD
> > GPU to use
> > > > > > "Hybrid Graphics" when on Windows and "Power Express" and
> > "Switchable Graphics"
> > > > > > when on Linux.
> > > > >
> > > > > and what's exactly the difference between those? And what's the actual
> > > > > issue here?
> > > >
> > > > Hybrid Graphics is the new "standard" way of handling these laptops.
> > > > It uses the standard _PR3 APCI method to handle dGPU power.  Support
> > > > for this was added to Linux relatively later compared to when the
> > > > laptops were launched.  "Power Express" used the other AMD specific
> > > > ATPX ACPI method to handle dGPU power.  The driver supports both so
> > > > either method will work.
> > > >
> > > > Alex
> > > >
> > >
> > > thanks for clarifying. But that still means that we won't need such
> > > workarounds for AMD users, right? amdgpu handles hybrid graphics just
> > > fine, right?
> >
> > Yeah it should, assuming you have a new enough kernel which supports
> > HG, which has been several years at this point IIRC.
> >
> > Alex
> >
>
> Can you define how new of a kernel is a new enough kernel?
>
> Looking on my side these problems were on new hardware (Precision 7740) and
> are checked as recently as start of this summer, w/ kernel 4.15.

I don't remember off hand.  It was before the _PR3 support in the pci
subsystem was upstream.  What was the problem you were seeing with
this system?

Alex

>
> We can arrange to have it checked again on 5.3rcX w/ the OSI disabled.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Alex Deucher
On Thu, Aug 15, 2019 at 10:25 AM Karol Herbst  wrote:
>
> On Thu, Aug 15, 2019 at 4:20 PM  wrote:
> >
> > > > There are definitely going to be regressions on machines in the field 
> > > > with the
> > > > in tree drivers by reverting this.  I think we should have an answer 
> > > > for all of
> > > those
> > > > before this revert is accepted.
> > > >
> > > > Regarding systems with Intel+NVIDIA, we'll have to work with partners to
> > > collect
> > > > some information on the impact of reverting this.
> > > >
> > > > When this is used on a system with Intel+AMD the ASL configures AMD GPU 
> > > > to
> > > use
> > > > "Hybrid Graphics" when on Windows and "Power Express" and "Switchable
> > > Graphics"
> > > > when on Linux.
> > >
> > > and what's exactly the difference between those? And what's the actual
> > > issue here?
> >
> > DP/HDMI is not detected unless plugged in at bootup.  It's due to missing 
> > HPD
> > events.
> >
>
> afaik Lyude was working on fixing all that, at least for some drivers.
> If there is something wrong, we still should fix the drivers, not
> adding ACPI workarounds.
>
> Alex: do you know if there are remaining issues regarding that with amdgpu?

There was an issue with hpd events not making it to the audio side
when things were powered down that was fixed with this patch set:
https://patchwork.freedesktop.org/patch/316793/
Those patches depended on a bunch of alsa changes as well which may
have not been available in the distro used for a particular OEM
program.

>
> > >
> > > We already have the PRIME offloading in place and if that's not
> > > enough, we should work on extending it, not adding some ACPI based
> > > workarounds, because that's exactly how that looks like.
> > >
> > > Also, was this discussed with anybody involved in the drm subsystem?
> > >
> > > >
> > > > I feel we need a knob and/or DMI detection to affect the changes that 
> > > > the ASL
> > > > normally performs.
> > >
> > > Why do we have to do that on a firmware level at all?
> >
> > Folks from AMD Graphics team recommended this approach.  From their 
> > perspective
> > it's not a workaround.  They view this as a different architecture for AMD 
> > graphics driver on
> > Windows and AMD graphics w/ amdgpu driver.  They have different ASL paths 
> > used for
> > each.
>
> @alex: is this true?

I'm not familiar with this patches in particular, but I know we've
done things with OEM programs to support Linux on platforms where
Linux support is lacking for in new features for the target distros.
E.g., when the first hybrid graphics laptops were coming out, Linux
didn't support it too well or at all depending on the timing, so the
bios exposed power express which was working well at the time if the
OS told ACPI it was Linux.

Alex


Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Karol Herbst
On Thu, Aug 15, 2019 at 4:34 PM  wrote:
>
> > -Original Message-
> > From: Karol Herbst 
> > Sent: Thursday, August 15, 2019 9:25 AM
> > To: Limonciello, Mario
> > Cc: Dave Airlie; LKML; Linux ACPI Mailing List; dri-devel; nouveau; Rafael 
> > J .
> > Wysocki; Alex Hung; Ben Skeggs; David Airlie
> > Subject: Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string 
> > to
> > enable dGPU direct output"
> >
> >
> > [EXTERNAL EMAIL]
> >
> > On Thu, Aug 15, 2019 at 4:20 PM  wrote:
> > >
> > > > > There are definitely going to be regressions on machines in the field 
> > > > > with
> > the
> > > > > in tree drivers by reverting this.  I think we should have an answer 
> > > > > for all of
> > > > those
> > > > > before this revert is accepted.
> > > > >
> > > > > Regarding systems with Intel+NVIDIA, we'll have to work with partners 
> > > > > to
> > > > collect
> > > > > some information on the impact of reverting this.
> > > > >
> > > > > When this is used on a system with Intel+AMD the ASL configures AMD
> > GPU to
> > > > use
> > > > > "Hybrid Graphics" when on Windows and "Power Express" and "Switchable
> > > > Graphics"
> > > > > when on Linux.
> > > >
> > > > and what's exactly the difference between those? And what's the actual
> > > > issue here?
> > >
> > > DP/HDMI is not detected unless plugged in at bootup.  It's due to missing 
> > > HPD
> > > events.
> > >
> >
> > afaik Lyude was working on fixing all that, at least for some drivers.
> > If there is something wrong, we still should fix the drivers, not
> > adding ACPI workarounds.
>
> I don't disagree, but timing is frequently a limitation if you want the 
> hardware to
> work when you put it on the market.
>
> The whole idea behind the OSI string was that it could be reverted when the 
> time
> was right.  From this discussion it very well may be for systems with AMD 
> GPU, but
> it needs to be checked again.
>
> >
> > Alex: do you know if there are remaining issues regarding that with amdgpu?
> >
> > > >
> > > > We already have the PRIME offloading in place and if that's not
> > > > enough, we should work on extending it, not adding some ACPI based
> > > > workarounds, because that's exactly how that looks like.
> > > >
> > > > Also, was this discussed with anybody involved in the drm subsystem?
> > > >
> > > > >
> > > > > I feel we need a knob and/or DMI detection to affect the changes that 
> > > > > the
> > ASL
> > > > > normally performs.
> > > >
> > > > Why do we have to do that on a firmware level at all?
> > >
> > > Folks from AMD Graphics team recommended this approach.
>
> I should clarify this is from the folks on AMD graphics team that interact to 
> OEMs
> like Dell which are not necessarily the same folks who work on the drivers 
> directly.
>

yeah, I understand the pressure, and it might have been fine if we
would have this discussion upstream, if it's about pushing things
upstream. The commits itself didn't even make the impression that
anybody with the drm drivers involved even looked at those patches and
this is the worst part of those commits.

> > From their perspective
> > > it's not a workaround.  They view this as a different architecture for AMD
> > graphics driver on
> > > Windows and AMD graphics w/ amdgpu driver.  They have different ASL paths
> > used for
> > > each.
> >
> > @alex: is this true?


Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Karol Herbst
On Thu, Aug 15, 2019 at 4:30 PM  wrote:
>
> > On Thu, Aug 15, 2019 at 10:15 AM Karol Herbst  wrote:
> > >
> > > On Thu, Aug 15, 2019 at 4:13 PM Alex Deucher 
> > wrote:
> > > >
> > > > On Thu, Aug 15, 2019 at 10:04 AM Karol Herbst  
> > > > wrote:
> > > > >
> > > > > On Thu, Aug 15, 2019 at 3:56 PM  wrote:
> > > > > >
> > > > > > > -Original Message-
> > > > > > > From: linux-acpi-ow...@vger.kernel.org  > ow...@vger.kernel.org> On
> > > > > > > Behalf Of Dave Airlie
> > > > > > > Sent: Wednesday, August 14, 2019 5:48 PM
> > > > > > > To: Karol Herbst
> > > > > > > Cc: LKML; Linux ACPI; dri-devel; nouveau; Rafael J . Wysocki; 
> > > > > > > Alex Hung;
> > Ben
> > > > > > > Skeggs; Dave Airlie
> > > > > > > Subject: Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM 
> > > > > > > _OSI
> > string to
> > > > > > > enable dGPU direct output"
> > > > > > >
> > > > > > > On Thu, 15 Aug 2019 at 07:31, Karol Herbst 
> > wrote:
> > > > > > > >
> > > > > > > > This reverts commit 28586a51eea666d5531bcaef2f68e4abbd87242c.
> > > > > > > >
> > > > > > > > The original commit message didn't even make sense. AMD _does_
> > support it and
> > > > > > > > it works with Nouveau as well.
> > > > > > > >
> > > > > > > > Also what was the issue being solved here? No references to any 
> > > > > > > > bugs
> > and not
> > > > > > > > even explaining any issue at all isn't the way we do things.
> > > > > > > >
> > > > > > > > And even if it means a muxed design, then the fix is to make it 
> > > > > > > > work
> > inside the
> > > > > > > > driver, not adding some hacky workaround through ACPI tricks.
> > > > > > > >
> > > > > > > > And what out of tree drivers do or do not support we don't care 
> > > > > > > > one
> > bit anyway.
> > > > > > > >
> > > > > > >
> > > > > > > I think the reverts should be merged via Rafael's tree as the 
> > > > > > > original
> > > > > > > patches went in via there, and we should get them in asap.
> > > > > > >
> > > > > > > Acked-by: Dave Airlie 
> > > > > > > Dave.
> > > > > >
> > > > > > There are definitely going to be regressions on machines in the 
> > > > > > field with
> > the
> > > > > > in tree drivers by reverting this.  I think we should have an 
> > > > > > answer for all
> > of those
> > > > > > before this revert is accepted.
> > > > > >
> > > > > > Regarding systems with Intel+NVIDIA, we'll have to work with 
> > > > > > partners to
> > collect
> > > > > > some information on the impact of reverting this.
> > > > > >
> > > > > > When this is used on a system with Intel+AMD the ASL configures AMD
> > GPU to use
> > > > > > "Hybrid Graphics" when on Windows and "Power Express" and
> > "Switchable Graphics"
> > > > > > when on Linux.
> > > > >
> > > > > and what's exactly the difference between those? And what's the actual
> > > > > issue here?
> > > >
> > > > Hybrid Graphics is the new "standard" way of handling these laptops.
> > > > It uses the standard _PR3 APCI method to handle dGPU power.  Support
> > > > for this was added to Linux relatively later compared to when the
> > > > laptops were launched.  "Power Express" used the other AMD specific
> > > > ATPX ACPI method to handle dGPU power.  The driver supports both so
> > > > either method will work.
> > > >
> > > > Alex
> > > >
> > >
> > > thanks for clarifying. But that still means that we won't need such
> > > workarounds for AMD users, right? amdgpu handles hybrid graphics just
> > > fine, right?
> >
> > Yeah it should, assuming you have a new enough kernel which supports
> > HG, which has been several years at this point IIRC.
> >
> > Alex
> >
>
> Can you define how new of a kernel is a new enough kernel?
>
> Looking on my side these problems were on new hardware (Precision 7740) and
> are checked as recently as start of this summer, w/ kernel 4.15.

That's not even a long term one. And it shouldn't get any fixes. I
just checked, last update was 16 months ago.

>
> We can arrange to have it checked again on 5.3rcX w/ the OSI disabled.

yeah, please do. If there are any issues, we (as in drm developers)
are happy to fix the issues inside the drivers.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

RE: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Mario.Limonciello
> -Original Message-
> From: Karol Herbst 
> Sent: Thursday, August 15, 2019 9:25 AM
> To: Limonciello, Mario
> Cc: Dave Airlie; LKML; Linux ACPI Mailing List; dri-devel; nouveau; Rafael J .
> Wysocki; Alex Hung; Ben Skeggs; David Airlie
> Subject: Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to
> enable dGPU direct output"
> 
> 
> [EXTERNAL EMAIL]
> 
> On Thu, Aug 15, 2019 at 4:20 PM  wrote:
> >
> > > > There are definitely going to be regressions on machines in the field 
> > > > with
> the
> > > > in tree drivers by reverting this.  I think we should have an answer 
> > > > for all of
> > > those
> > > > before this revert is accepted.
> > > >
> > > > Regarding systems with Intel+NVIDIA, we'll have to work with partners to
> > > collect
> > > > some information on the impact of reverting this.
> > > >
> > > > When this is used on a system with Intel+AMD the ASL configures AMD
> GPU to
> > > use
> > > > "Hybrid Graphics" when on Windows and "Power Express" and "Switchable
> > > Graphics"
> > > > when on Linux.
> > >
> > > and what's exactly the difference between those? And what's the actual
> > > issue here?
> >
> > DP/HDMI is not detected unless plugged in at bootup.  It's due to missing 
> > HPD
> > events.
> >
> 
> afaik Lyude was working on fixing all that, at least for some drivers.
> If there is something wrong, we still should fix the drivers, not
> adding ACPI workarounds.

I don't disagree, but timing is frequently a limitation if you want the 
hardware to
work when you put it on the market.

The whole idea behind the OSI string was that it could be reverted when the time
was right.  From this discussion it very well may be for systems with AMD GPU, 
but
it needs to be checked again.

> 
> Alex: do you know if there are remaining issues regarding that with amdgpu?
> 
> > >
> > > We already have the PRIME offloading in place and if that's not
> > > enough, we should work on extending it, not adding some ACPI based
> > > workarounds, because that's exactly how that looks like.
> > >
> > > Also, was this discussed with anybody involved in the drm subsystem?
> > >
> > > >
> > > > I feel we need a knob and/or DMI detection to affect the changes that 
> > > > the
> ASL
> > > > normally performs.
> > >
> > > Why do we have to do that on a firmware level at all?
> >
> > Folks from AMD Graphics team recommended this approach.  

I should clarify this is from the folks on AMD graphics team that interact to 
OEMs
like Dell which are not necessarily the same folks who work on the drivers 
directly.

> From their perspective
> > it's not a workaround.  They view this as a different architecture for AMD
> graphics driver on
> > Windows and AMD graphics w/ amdgpu driver.  They have different ASL paths
> used for
> > each.
> 
> @alex: is this true?


RE: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Mario.Limonciello
> On Thu, Aug 15, 2019 at 10:15 AM Karol Herbst  wrote:
> >
> > On Thu, Aug 15, 2019 at 4:13 PM Alex Deucher 
> wrote:
> > >
> > > On Thu, Aug 15, 2019 at 10:04 AM Karol Herbst  wrote:
> > > >
> > > > On Thu, Aug 15, 2019 at 3:56 PM  wrote:
> > > > >
> > > > > > -Original Message-
> > > > > > From: linux-acpi-ow...@vger.kernel.org  ow...@vger.kernel.org> On
> > > > > > Behalf Of Dave Airlie
> > > > > > Sent: Wednesday, August 14, 2019 5:48 PM
> > > > > > To: Karol Herbst
> > > > > > Cc: LKML; Linux ACPI; dri-devel; nouveau; Rafael J . Wysocki; Alex 
> > > > > > Hung;
> Ben
> > > > > > Skeggs; Dave Airlie
> > > > > > Subject: Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI
> string to
> > > > > > enable dGPU direct output"
> > > > > >
> > > > > > On Thu, 15 Aug 2019 at 07:31, Karol Herbst 
> wrote:
> > > > > > >
> > > > > > > This reverts commit 28586a51eea666d5531bcaef2f68e4abbd87242c.
> > > > > > >
> > > > > > > The original commit message didn't even make sense. AMD _does_
> support it and
> > > > > > > it works with Nouveau as well.
> > > > > > >
> > > > > > > Also what was the issue being solved here? No references to any 
> > > > > > > bugs
> and not
> > > > > > > even explaining any issue at all isn't the way we do things.
> > > > > > >
> > > > > > > And even if it means a muxed design, then the fix is to make it 
> > > > > > > work
> inside the
> > > > > > > driver, not adding some hacky workaround through ACPI tricks.
> > > > > > >
> > > > > > > And what out of tree drivers do or do not support we don't care 
> > > > > > > one
> bit anyway.
> > > > > > >
> > > > > >
> > > > > > I think the reverts should be merged via Rafael's tree as the 
> > > > > > original
> > > > > > patches went in via there, and we should get them in asap.
> > > > > >
> > > > > > Acked-by: Dave Airlie 
> > > > > > Dave.
> > > > >
> > > > > There are definitely going to be regressions on machines in the field 
> > > > > with
> the
> > > > > in tree drivers by reverting this.  I think we should have an answer 
> > > > > for all
> of those
> > > > > before this revert is accepted.
> > > > >
> > > > > Regarding systems with Intel+NVIDIA, we'll have to work with partners 
> > > > > to
> collect
> > > > > some information on the impact of reverting this.
> > > > >
> > > > > When this is used on a system with Intel+AMD the ASL configures AMD
> GPU to use
> > > > > "Hybrid Graphics" when on Windows and "Power Express" and
> "Switchable Graphics"
> > > > > when on Linux.
> > > >
> > > > and what's exactly the difference between those? And what's the actual
> > > > issue here?
> > >
> > > Hybrid Graphics is the new "standard" way of handling these laptops.
> > > It uses the standard _PR3 APCI method to handle dGPU power.  Support
> > > for this was added to Linux relatively later compared to when the
> > > laptops were launched.  "Power Express" used the other AMD specific
> > > ATPX ACPI method to handle dGPU power.  The driver supports both so
> > > either method will work.
> > >
> > > Alex
> > >
> >
> > thanks for clarifying. But that still means that we won't need such
> > workarounds for AMD users, right? amdgpu handles hybrid graphics just
> > fine, right?
> 
> Yeah it should, assuming you have a new enough kernel which supports
> HG, which has been several years at this point IIRC.
> 
> Alex
> 

Can you define how new of a kernel is a new enough kernel?

Looking on my side these problems were on new hardware (Precision 7740) and
are checked as recently as start of this summer, w/ kernel 4.15.

We can arrange to have it checked again on 5.3rcX w/ the OSI disabled.


Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Karol Herbst
On Thu, Aug 15, 2019 at 4:20 PM  wrote:
>
> > > There are definitely going to be regressions on machines in the field 
> > > with the
> > > in tree drivers by reverting this.  I think we should have an answer for 
> > > all of
> > those
> > > before this revert is accepted.
> > >
> > > Regarding systems with Intel+NVIDIA, we'll have to work with partners to
> > collect
> > > some information on the impact of reverting this.
> > >
> > > When this is used on a system with Intel+AMD the ASL configures AMD GPU to
> > use
> > > "Hybrid Graphics" when on Windows and "Power Express" and "Switchable
> > Graphics"
> > > when on Linux.
> >
> > and what's exactly the difference between those? And what's the actual
> > issue here?
>
> DP/HDMI is not detected unless plugged in at bootup.  It's due to missing HPD
> events.
>

afaik Lyude was working on fixing all that, at least for some drivers.
If there is something wrong, we still should fix the drivers, not
adding ACPI workarounds.

Alex: do you know if there are remaining issues regarding that with amdgpu?

> >
> > We already have the PRIME offloading in place and if that's not
> > enough, we should work on extending it, not adding some ACPI based
> > workarounds, because that's exactly how that looks like.
> >
> > Also, was this discussed with anybody involved in the drm subsystem?
> >
> > >
> > > I feel we need a knob and/or DMI detection to affect the changes that the 
> > > ASL
> > > normally performs.
> >
> > Why do we have to do that on a firmware level at all?
>
> Folks from AMD Graphics team recommended this approach.  From their 
> perspective
> it's not a workaround.  They view this as a different architecture for AMD 
> graphics driver on
> Windows and AMD graphics w/ amdgpu driver.  They have different ASL paths 
> used for
> each.

@alex: is this true?


RE: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Mario.Limonciello
> > There are definitely going to be regressions on machines in the field with 
> > the
> > in tree drivers by reverting this.  I think we should have an answer for 
> > all of
> those
> > before this revert is accepted.
> >
> > Regarding systems with Intel+NVIDIA, we'll have to work with partners to
> collect
> > some information on the impact of reverting this.
> >
> > When this is used on a system with Intel+AMD the ASL configures AMD GPU to
> use
> > "Hybrid Graphics" when on Windows and "Power Express" and "Switchable
> Graphics"
> > when on Linux.
> 
> and what's exactly the difference between those? And what's the actual
> issue here?

DP/HDMI is not detected unless plugged in at bootup.  It's due to missing HPD
events.

> 
> We already have the PRIME offloading in place and if that's not
> enough, we should work on extending it, not adding some ACPI based
> workarounds, because that's exactly how that looks like.
> 
> Also, was this discussed with anybody involved in the drm subsystem?
> 
> >
> > I feel we need a knob and/or DMI detection to affect the changes that the 
> > ASL
> > normally performs.
> 
> Why do we have to do that on a firmware level at all?

Folks from AMD Graphics team recommended this approach.  From their perspective
it's not a workaround.  They view this as a different architecture for AMD 
graphics driver on
Windows and AMD graphics w/ amdgpu driver.  They have different ASL paths used 
for
each.


Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Alex Deucher
On Thu, Aug 15, 2019 at 10:15 AM Karol Herbst  wrote:
>
> On Thu, Aug 15, 2019 at 4:13 PM Alex Deucher  wrote:
> >
> > On Thu, Aug 15, 2019 at 10:04 AM Karol Herbst  wrote:
> > >
> > > On Thu, Aug 15, 2019 at 3:56 PM  wrote:
> > > >
> > > > > -Original Message-
> > > > > From: linux-acpi-ow...@vger.kernel.org 
> > > > >  On
> > > > > Behalf Of Dave Airlie
> > > > > Sent: Wednesday, August 14, 2019 5:48 PM
> > > > > To: Karol Herbst
> > > > > Cc: LKML; Linux ACPI; dri-devel; nouveau; Rafael J . Wysocki; Alex 
> > > > > Hung; Ben
> > > > > Skeggs; Dave Airlie
> > > > > Subject: Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI 
> > > > > string to
> > > > > enable dGPU direct output"
> > > > >
> > > > > On Thu, 15 Aug 2019 at 07:31, Karol Herbst  wrote:
> > > > > >
> > > > > > This reverts commit 28586a51eea666d5531bcaef2f68e4abbd87242c.
> > > > > >
> > > > > > The original commit message didn't even make sense. AMD _does_ 
> > > > > > support it and
> > > > > > it works with Nouveau as well.
> > > > > >
> > > > > > Also what was the issue being solved here? No references to any 
> > > > > > bugs and not
> > > > > > even explaining any issue at all isn't the way we do things.
> > > > > >
> > > > > > And even if it means a muxed design, then the fix is to make it 
> > > > > > work inside the
> > > > > > driver, not adding some hacky workaround through ACPI tricks.
> > > > > >
> > > > > > And what out of tree drivers do or do not support we don't care one 
> > > > > > bit anyway.
> > > > > >
> > > > >
> > > > > I think the reverts should be merged via Rafael's tree as the original
> > > > > patches went in via there, and we should get them in asap.
> > > > >
> > > > > Acked-by: Dave Airlie 
> > > > > Dave.
> > > >
> > > > There are definitely going to be regressions on machines in the field 
> > > > with the
> > > > in tree drivers by reverting this.  I think we should have an answer 
> > > > for all of those
> > > > before this revert is accepted.
> > > >
> > > > Regarding systems with Intel+NVIDIA, we'll have to work with partners 
> > > > to collect
> > > > some information on the impact of reverting this.
> > > >
> > > > When this is used on a system with Intel+AMD the ASL configures AMD GPU 
> > > > to use
> > > > "Hybrid Graphics" when on Windows and "Power Express" and "Switchable 
> > > > Graphics"
> > > > when on Linux.
> > >
> > > and what's exactly the difference between those? And what's the actual
> > > issue here?
> >
> > Hybrid Graphics is the new "standard" way of handling these laptops.
> > It uses the standard _PR3 APCI method to handle dGPU power.  Support
> > for this was added to Linux relatively later compared to when the
> > laptops were launched.  "Power Express" used the other AMD specific
> > ATPX ACPI method to handle dGPU power.  The driver supports both so
> > either method will work.
> >
> > Alex
> >
>
> thanks for clarifying. But that still means that we won't need such
> workarounds for AMD users, right? amdgpu handles hybrid graphics just
> fine, right?

Yeah it should, assuming you have a new enough kernel which supports
HG, which has been several years at this point IIRC.

Alex

>
> > >
> > > We already have the PRIME offloading in place and if that's not
> > > enough, we should work on extending it, not adding some ACPI based
> > > workarounds, because that's exactly how that looks like.
> > >
> > > Also, was this discussed with anybody involved in the drm subsystem?
> > >
> > > >
> > > > I feel we need a knob and/or DMI detection to affect the changes that 
> > > > the ASL
> > > > normally performs.
> > >
> > > Why do we have to do that on a firmware level at all?
> > > ___
> > > dri-devel mailing list
> > > dri-de...@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Karol Herbst
On Thu, Aug 15, 2019 at 4:13 PM Alex Deucher  wrote:
>
> On Thu, Aug 15, 2019 at 10:04 AM Karol Herbst  wrote:
> >
> > On Thu, Aug 15, 2019 at 3:56 PM  wrote:
> > >
> > > > -Original Message-
> > > > From: linux-acpi-ow...@vger.kernel.org 
> > > >  On
> > > > Behalf Of Dave Airlie
> > > > Sent: Wednesday, August 14, 2019 5:48 PM
> > > > To: Karol Herbst
> > > > Cc: LKML; Linux ACPI; dri-devel; nouveau; Rafael J . Wysocki; Alex 
> > > > Hung; Ben
> > > > Skeggs; Dave Airlie
> > > > Subject: Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI 
> > > > string to
> > > > enable dGPU direct output"
> > > >
> > > > On Thu, 15 Aug 2019 at 07:31, Karol Herbst  wrote:
> > > > >
> > > > > This reverts commit 28586a51eea666d5531bcaef2f68e4abbd87242c.
> > > > >
> > > > > The original commit message didn't even make sense. AMD _does_ 
> > > > > support it and
> > > > > it works with Nouveau as well.
> > > > >
> > > > > Also what was the issue being solved here? No references to any bugs 
> > > > > and not
> > > > > even explaining any issue at all isn't the way we do things.
> > > > >
> > > > > And even if it means a muxed design, then the fix is to make it work 
> > > > > inside the
> > > > > driver, not adding some hacky workaround through ACPI tricks.
> > > > >
> > > > > And what out of tree drivers do or do not support we don't care one 
> > > > > bit anyway.
> > > > >
> > > >
> > > > I think the reverts should be merged via Rafael's tree as the original
> > > > patches went in via there, and we should get them in asap.
> > > >
> > > > Acked-by: Dave Airlie 
> > > > Dave.
> > >
> > > There are definitely going to be regressions on machines in the field 
> > > with the
> > > in tree drivers by reverting this.  I think we should have an answer for 
> > > all of those
> > > before this revert is accepted.
> > >
> > > Regarding systems with Intel+NVIDIA, we'll have to work with partners to 
> > > collect
> > > some information on the impact of reverting this.
> > >
> > > When this is used on a system with Intel+AMD the ASL configures AMD GPU 
> > > to use
> > > "Hybrid Graphics" when on Windows and "Power Express" and "Switchable 
> > > Graphics"
> > > when on Linux.
> >
> > and what's exactly the difference between those? And what's the actual
> > issue here?
>
> Hybrid Graphics is the new "standard" way of handling these laptops.
> It uses the standard _PR3 APCI method to handle dGPU power.  Support
> for this was added to Linux relatively later compared to when the
> laptops were launched.  "Power Express" used the other AMD specific
> ATPX ACPI method to handle dGPU power.  The driver supports both so
> either method will work.
>
> Alex
>

thanks for clarifying. But that still means that we won't need such
workarounds for AMD users, right? amdgpu handles hybrid graphics just
fine, right?

> >
> > We already have the PRIME offloading in place and if that's not
> > enough, we should work on extending it, not adding some ACPI based
> > workarounds, because that's exactly how that looks like.
> >
> > Also, was this discussed with anybody involved in the drm subsystem?
> >
> > >
> > > I feel we need a knob and/or DMI detection to affect the changes that the 
> > > ASL
> > > normally performs.
> >
> > Why do we have to do that on a firmware level at all?
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Alex Deucher
On Thu, Aug 15, 2019 at 10:04 AM Karol Herbst  wrote:
>
> On Thu, Aug 15, 2019 at 3:56 PM  wrote:
> >
> > > -Original Message-
> > > From: linux-acpi-ow...@vger.kernel.org  
> > > On
> > > Behalf Of Dave Airlie
> > > Sent: Wednesday, August 14, 2019 5:48 PM
> > > To: Karol Herbst
> > > Cc: LKML; Linux ACPI; dri-devel; nouveau; Rafael J . Wysocki; Alex Hung; 
> > > Ben
> > > Skeggs; Dave Airlie
> > > Subject: Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI 
> > > string to
> > > enable dGPU direct output"
> > >
> > > On Thu, 15 Aug 2019 at 07:31, Karol Herbst  wrote:
> > > >
> > > > This reverts commit 28586a51eea666d5531bcaef2f68e4abbd87242c.
> > > >
> > > > The original commit message didn't even make sense. AMD _does_ support 
> > > > it and
> > > > it works with Nouveau as well.
> > > >
> > > > Also what was the issue being solved here? No references to any bugs 
> > > > and not
> > > > even explaining any issue at all isn't the way we do things.
> > > >
> > > > And even if it means a muxed design, then the fix is to make it work 
> > > > inside the
> > > > driver, not adding some hacky workaround through ACPI tricks.
> > > >
> > > > And what out of tree drivers do or do not support we don't care one bit 
> > > > anyway.
> > > >
> > >
> > > I think the reverts should be merged via Rafael's tree as the original
> > > patches went in via there, and we should get them in asap.
> > >
> > > Acked-by: Dave Airlie 
> > > Dave.
> >
> > There are definitely going to be regressions on machines in the field with 
> > the
> > in tree drivers by reverting this.  I think we should have an answer for 
> > all of those
> > before this revert is accepted.
> >
> > Regarding systems with Intel+NVIDIA, we'll have to work with partners to 
> > collect
> > some information on the impact of reverting this.
> >
> > When this is used on a system with Intel+AMD the ASL configures AMD GPU to 
> > use
> > "Hybrid Graphics" when on Windows and "Power Express" and "Switchable 
> > Graphics"
> > when on Linux.
>
> and what's exactly the difference between those? And what's the actual
> issue here?

Hybrid Graphics is the new "standard" way of handling these laptops.
It uses the standard _PR3 APCI method to handle dGPU power.  Support
for this was added to Linux relatively later compared to when the
laptops were launched.  "Power Express" used the other AMD specific
ATPX ACPI method to handle dGPU power.  The driver supports both so
either method will work.

Alex

>
> We already have the PRIME offloading in place and if that's not
> enough, we should work on extending it, not adding some ACPI based
> workarounds, because that's exactly how that looks like.
>
> Also, was this discussed with anybody involved in the drm subsystem?
>
> >
> > I feel we need a knob and/or DMI detection to affect the changes that the 
> > ASL
> > normally performs.
>
> Why do we have to do that on a firmware level at all?
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Daniel Vetter
On Thu, Aug 15, 2019 at 12:47 AM Dave Airlie  wrote:
>
> On Thu, 15 Aug 2019 at 07:31, Karol Herbst  wrote:
> >
> > This reverts commit 28586a51eea666d5531bcaef2f68e4abbd87242c.
> >
> > The original commit message didn't even make sense. AMD _does_ support it 
> > and
> > it works with Nouveau as well.
> >
> > Also what was the issue being solved here? No references to any bugs and not
> > even explaining any issue at all isn't the way we do things.
> >
> > And even if it means a muxed design, then the fix is to make it work inside 
> > the
> > driver, not adding some hacky workaround through ACPI tricks.
> >
> > And what out of tree drivers do or do not support we don't care one bit 
> > anyway.
> >
>
> I think the reverts should be merged via Rafael's tree as the original
> patches went in via there, and we should get them in asap.

+1

> Acked-by: Dave Airlie 

Acked-by: Daniel Vetter 

Also fully agreeing with Karol's reply further down, if this doesn't
work we need to improve the drivers, not pile stuff on top in some
ACPI hacks.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Karol Herbst
On Thu, Aug 15, 2019 at 3:56 PM  wrote:
>
> > -Original Message-
> > From: linux-acpi-ow...@vger.kernel.org  On
> > Behalf Of Dave Airlie
> > Sent: Wednesday, August 14, 2019 5:48 PM
> > To: Karol Herbst
> > Cc: LKML; Linux ACPI; dri-devel; nouveau; Rafael J . Wysocki; Alex Hung; Ben
> > Skeggs; Dave Airlie
> > Subject: Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string 
> > to
> > enable dGPU direct output"
> >
> > On Thu, 15 Aug 2019 at 07:31, Karol Herbst  wrote:
> > >
> > > This reverts commit 28586a51eea666d5531bcaef2f68e4abbd87242c.
> > >
> > > The original commit message didn't even make sense. AMD _does_ support it 
> > > and
> > > it works with Nouveau as well.
> > >
> > > Also what was the issue being solved here? No references to any bugs and 
> > > not
> > > even explaining any issue at all isn't the way we do things.
> > >
> > > And even if it means a muxed design, then the fix is to make it work 
> > > inside the
> > > driver, not adding some hacky workaround through ACPI tricks.
> > >
> > > And what out of tree drivers do or do not support we don't care one bit 
> > > anyway.
> > >
> >
> > I think the reverts should be merged via Rafael's tree as the original
> > patches went in via there, and we should get them in asap.
> >
> > Acked-by: Dave Airlie 
> > Dave.
>
> There are definitely going to be regressions on machines in the field with the
> in tree drivers by reverting this.  I think we should have an answer for all 
> of those
> before this revert is accepted.
>
> Regarding systems with Intel+NVIDIA, we'll have to work with partners to 
> collect
> some information on the impact of reverting this.
>
> When this is used on a system with Intel+AMD the ASL configures AMD GPU to use
> "Hybrid Graphics" when on Windows and "Power Express" and "Switchable 
> Graphics"
> when on Linux.

and what's exactly the difference between those? And what's the actual
issue here?

We already have the PRIME offloading in place and if that's not
enough, we should work on extending it, not adding some ACPI based
workarounds, because that's exactly how that looks like.

Also, was this discussed with anybody involved in the drm subsystem?

>
> I feel we need a knob and/or DMI detection to affect the changes that the ASL
> normally performs.

Why do we have to do that on a firmware level at all?
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

RE: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"

2019-08-15 Thread Mario.Limonciello
> -Original Message-
> From: linux-acpi-ow...@vger.kernel.org  On
> Behalf Of Dave Airlie
> Sent: Wednesday, August 14, 2019 5:48 PM
> To: Karol Herbst
> Cc: LKML; Linux ACPI; dri-devel; nouveau; Rafael J . Wysocki; Alex Hung; Ben
> Skeggs; Dave Airlie
> Subject: Re: [Nouveau] [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to
> enable dGPU direct output"
> 
> On Thu, 15 Aug 2019 at 07:31, Karol Herbst  wrote:
> >
> > This reverts commit 28586a51eea666d5531bcaef2f68e4abbd87242c.
> >
> > The original commit message didn't even make sense. AMD _does_ support it 
> > and
> > it works with Nouveau as well.
> >
> > Also what was the issue being solved here? No references to any bugs and not
> > even explaining any issue at all isn't the way we do things.
> >
> > And even if it means a muxed design, then the fix is to make it work inside 
> > the
> > driver, not adding some hacky workaround through ACPI tricks.
> >
> > And what out of tree drivers do or do not support we don't care one bit 
> > anyway.
> >
> 
> I think the reverts should be merged via Rafael's tree as the original
> patches went in via there, and we should get them in asap.
> 
> Acked-by: Dave Airlie 
> Dave.

There are definitely going to be regressions on machines in the field with the
in tree drivers by reverting this.  I think we should have an answer for all of 
those
before this revert is accepted.

Regarding systems with Intel+NVIDIA, we'll have to work with partners to collect
some information on the impact of reverting this.

When this is used on a system with Intel+AMD the ASL configures AMD GPU to use
"Hybrid Graphics" when on Windows and "Power Express" and "Switchable Graphics"
when on Linux.

I feel we need a knob and/or DMI detection to affect the changes that the ASL
normally performs.


Re: DMA-API: cacheline tracking ENOMEM, dma-debug disabled due to nouveau ?

2019-08-15 Thread Robin Murphy

On 15/08/2019 14:35, Christoph Hellwig wrote:

On Wed, Aug 14, 2019 at 07:49:27PM +0200, Daniel Vetter wrote:

On Wed, Aug 14, 2019 at 04:50:33PM +0200, Corentin Labbe wrote:

Hello

Since lot of release (at least since 4.19), I hit the following error message:
DMA-API: cacheline tracking ENOMEM, dma-debug disabled

After hitting that, I try to check who is creating so many DMA mapping and see:
cat /sys/kernel/debug/dma-api/dump | cut -d' ' -f2 | sort | uniq -c
   6 ahci
 257 e1000e
   6 ehci-pci
5891 nouveau
  24 uhci_hcd

Does nouveau having this high number of DMA mapping is normal ?


Yeah seems perfectly fine for a gpu.


That is a lot and apparently overwhelm the dma-debug tracking.  Robin
rewrote this code in Linux 4.21 to work a little better, so I'm curious
why this might have changes in 4.19, as dma-debug did not change at
all there.


FWIW, the cacheline tracking entries are a separate thing from the 
dma-debug entries that I rejigged - judging by those numbers there 
should still be plenty of free dma-debug entries, but for some reason it 
has failed to extend the radix tree :/


Robin.


Re: [Nouveau] DMA-API: cacheline tracking ENOMEM, dma-debug disabled due to nouveau ?

2019-08-15 Thread Christoph Hellwig
On Wed, Aug 14, 2019 at 07:49:27PM +0200, Daniel Vetter wrote:
> On Wed, Aug 14, 2019 at 04:50:33PM +0200, Corentin Labbe wrote:
> > Hello
> > 
> > Since lot of release (at least since 4.19), I hit the following error 
> > message:
> > DMA-API: cacheline tracking ENOMEM, dma-debug disabled
> > 
> > After hitting that, I try to check who is creating so many DMA mapping and 
> > see:
> > cat /sys/kernel/debug/dma-api/dump | cut -d' ' -f2 | sort | uniq -c
> >   6 ahci
> > 257 e1000e
> >   6 ehci-pci
> >5891 nouveau
> >  24 uhci_hcd
> > 
> > Does nouveau having this high number of DMA mapping is normal ?
> 
> Yeah seems perfectly fine for a gpu.

That is a lot and apparently overwhelm the dma-debug tracking.  Robin
rewrote this code in Linux 4.21 to work a little better, so I'm curious
why this might have changes in 4.19, as dma-debug did not change at
all there.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] turn hmm migrate_vma upside down v3

2019-08-15 Thread Christoph Hellwig
On Wed, Aug 14, 2019 at 05:09:54PM -0700, Ralph Campbell wrote:
> Some of the patches seem to have been mangled in the mail.

Weird, I never had such a an issue with git-send-email.

But to be covered for such weird cases I also posted a git url
for exactly the tree I've been working on.

> I was able to edit them and apply to Jason's tree
> https://github.com/jgunthorpe/linux.git mmu_notifier branch.
> So for the series you can add:
>
> Tested-by: Ralph Campbell 

Thanks!
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH] volt: Fix for some cards having 0 maximum voltage

2019-08-15 Thread Karol Herbst
although I'd like to have a deeper understanding of this table, I
think it's fine as it is right now and this actually does fix
something without causing any issues (as far as we know of)

Reviewed-by: Karol Herbst 

On Fri, Aug 2, 2019 at 11:21 AM Mark Menzynski  wrote:
>
> Some, mostly Fermi, vbioses appear to have zero max voltage. That causes 
> Nouveau to not parse voltage entries, thus users not being able to set higher 
> clocks.
>
> When changing this value Nvidia driver still appeared to ignore it, and I 
> wasn't able to find out why, thus the code is ignoring the value if it is 
> zero.
>
> CC: Maarten Lankhorst 
> Signed-off-by: Mark Menzynski 
> ---
>  drm/nouveau/nvkm/subdev/bios/volt.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drm/nouveau/nvkm/subdev/bios/volt.c 
> b/drm/nouveau/nvkm/subdev/bios/volt.c
> index 7143ea46..33a9fb5a 100644
> --- a/drm/nouveau/nvkm/subdev/bios/volt.c
> +++ b/drm/nouveau/nvkm/subdev/bios/volt.c
> @@ -96,6 +96,8 @@ nvbios_volt_parse(struct nvkm_bios *bios, u8 *ver, u8 *hdr, 
> u8 *cnt, u8 *len,
> info->min = min(info->base,
> info->base + info->step * info->vidmask);
> info->max = nvbios_rd32(bios, volt + 0x0e);
> +   if (!info->max)
> +   info->max = max(info->base, info->base + info->step * 
> info->vidmask);
> break;
> case 0x50:
> info->min = nvbios_rd32(bios, volt + 0x0a);
> --
> 2.21.0
>
> ___
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau