Re: [PATCH hmm 5/6] mm/hmm: remove the CONFIG_TRANSPARENT_HUGEPAGE #ifdef
On Fri, Mar 20, 2020 at 01:49:04PM -0300, Jason Gunthorpe wrote: > From: Jason Gunthorpe > > This code can be compiled when CONFIG_TRANSPARENT_HUGEPAGE is off, so > remove the ifdef. It can compile, but will the compiler optimize it away? Seems like both pmd_trans_huge and pmd_devmap are stubs for !CONFIG_TRANSPARENT_HUGEPAGE, so yes. But that should be mentioned in the commit message. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 4/4] mm: check the device private page owner in hmm_range_fault
On Fri, Mar 20, 2020 at 10:41:09AM -0300, Jason Gunthorpe wrote: > Thinking about this some more, does the locking work out here? > > hmm_range_fault() runs with mmap_sem in read, and does not lock any of > the page table levels. > > So it relies on accessing stale pte data being safe, and here we > introduce for the first time a page pointer dereference and a pgmap > dereference without any locking/refcounting. > > The get_dev_pagemap() worked on the PFN and obtained a refcount, so it > created safety. > > Is there some tricky reason this is safe, eg a DEVICE_PRIVATE page > cannot be removed from the vma without holding mmap_sem in write or > something? I don't think there is any specific protection. Let me see if we can throw in a get_dev_pagemap here - note that current mainline doesn't even use it for this path.. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 3/4] mm: simplify device private page handling in hmm_range_fault
On Thu, Mar 19, 2020 at 09:03:45PM -0300, Jason Gunthorpe wrote: > > Should tests enable the feature or the feature enable the test? > > IMHO, if the feature is being compiled into the kernel, that should > > enable the menu item for the test. If the feature isn't selected, > > no need to test it :-) > > I ment if DEVICE_PRIVATE should be a user selectable option at all, or > should it be turned on when a driver like nouveau is selected. I don't think it should be user selectable. This is an implementation detail users can't know about. > Is there some downside to enabling DEVICE_PRIVATE? The option itself adds a little more code to the core kernel, and introduces a few additional branches in core mm code. But more importantly it pulls in the whole pgmap infrastructure. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH hmm 1/6] mm/hmm: remove pgmap checking for devmap pages
On Fri, Mar 20, 2020 at 01:49:00PM -0300, Jason Gunthorpe wrote: > From: Jason Gunthorpe > > The checking boils down to some racy check if the pagemap is still > available or not. Instead of checking this, rely entirely on the > notifiers, if a pagemap is destroyed then all pages that belong to it must > be removed from the tables and the notifiers triggered. > > Signed-off-by: Jason Gunthorpe Looks good, Reviewed-by: Christoph Hellwig ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH hmm 6/6] mm/hmm: use device_private_entry_to_pfn()
On Fri, Mar 20, 2020 at 01:49:05PM -0300, Jason Gunthorpe wrote: > From: Jason Gunthorpe > > swp_offset() should not be called directly, the wrappers are supposed to > abstract away the encoding of the device_private specific information in > the swap entry. > > Signed-off-by: Jason Gunthorpe > --- > mm/hmm.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/hmm.c b/mm/hmm.c > index a09b4908e9c81a..fd9ee2b5fd9989 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -259,8 +259,8 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, > unsigned long addr, >* the PFN even if not present. >*/ > if (hmm_is_device_private_entry(range, entry)) { > - *pfn = hmm_device_entry_from_pfn(range, > - swp_offset(entry)); > + *pfn = hmm_device_entry_from_pfn( > + range, device_private_entry_to_pfn(entry)); The range parameter can stay on the first line.. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH hmm 3/6] mm/hmm: remove unused code and tidy comments
On Fri, Mar 20, 2020 at 01:49:02PM -0300, Jason Gunthorpe wrote: > From: Jason Gunthorpe > > Delete several functions that are never called, fix some desync between > comments and structure content, remove an unused ret, and move one > function only used by hmm.c into hmm.c This looks good: Signed-off-by: Christoph Hellwig Btw, the top of file comment in include/linux/hmm.h really needs some work as well. In fact I think it should be mostly removed with any remaining useful bit moved to Documentation/vm/hmm.rst. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH hmm 4/6] mm/hmm: remove HMM_FAULT_SNAPSHOT
On Fri, Mar 20, 2020 at 01:49:03PM -0300, Jason Gunthorpe wrote: > + struct hmm_range *range = hmm_vma_walk->range; > unsigned int required_fault = 0; > unsigned long i; > > - if (hmm_vma_walk->flags & HMM_FAULT_SNAPSHOT) > + /* > + * If there is no way for valid to be set in hmm_pte_need_fault() then > + * don't bother to call it. > + */ > + if (!(((range->flags[HMM_PFN_VALID] & range->pfn_flags_mask) | > +range->default_flags) & > + range->flags[HMM_PFN_VALID])) I think this needs to be split out into a well documented helper function, as it is pretty much unreadable all cramed into a complex conditional. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH hmm 2/6] mm/hmm: return the fault type from hmm_pte_need_fault()
On Fri, Mar 20, 2020 at 01:49:01PM -0300, Jason Gunthorpe wrote: > +enum { > + NEED_FAULT = 1 << 0, > + NEED_WRITE_FAULT = 1 << 1, > +}; Maybe add a HMM_ prefix? > for (i = 0; i < npages; ++i) { > + required_fault |= > + hmm_pte_need_fault(hmm_vma_walk, pfns[i], cpu_flags); > + if (required_fault == (NEED_FAULT | NEED_WRITE_FAULT)) > + return required_fault; No need for the inner braces. > @@ -532,17 +515,15 @@ static int hmm_vma_walk_test(unsigned long start, > unsigned long end, >*/ > if ((vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP)) || > !(vma->vm_flags & VM_READ)) { > - bool fault, write_fault; > - No that there is no need for local variables I'd invert the test and return early: if ((vma->vm_flags & VM_READ) && !(vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP)) return 0; > /* >* Check to see if a fault is requested for any page in the >* range. >*/ > - hmm_range_need_fault(hmm_vma_walk, range->pfns + > - ((start - range->start) >> PAGE_SHIFT), > - (end - start) >> PAGE_SHIFT, > - 0, &fault, &write_fault); > - if (fault || write_fault) > + if (hmm_range_need_fault(hmm_vma_walk, > + range->pfns + > + ((start - range->start) >> > + PAGE_SHIFT), > + (end - start) >> PAGE_SHIFT, 0)) Which should help to make this a little more readable.. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Possibility of RX570 responsible for spontaneous reboots (MCE) with Ryzen 3700x?
Hi John, > >I know RX570 (polaris) should stay at PCI3 as far as I know. > > Yep... thought I remembered you mentioning having a 5700XT though... is that > in a different system ? I am using a RX570, the guy from reddit changed from R600 to an 5700XT and it seems it did solve his reboot problems. As the system is rock solid with windows-10 and others seem to experience similar behaviour I've decided to file a bug at the kernel's bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206903 Thanks for your suggestions & best regards, Clemens > > > From: Clemens Eisserer > Sent: March 9, 2020 2:30 AM > To: Bridgman, John ; amd-gfx@lists.freedesktop.org > > Subject: Re: Possibility of RX570 responsible for spontaneous reboots (MCE) > with Ryzen 3700x? > > Hi John, > > Thanks a lot for taking the time to look at this, even if it doesn't > seem to be GPU related at first. > > > OK, that's a bit strange... I found mce log and MCE-Ryzen-Decoder as > > options for decoding. > Sorry for omitting that information - indeed I was using > MCE-Ryzen-Decoder, thanks for pointing to mcelog. > The mce log output definitivly makes more sense, I'll try to > experiment a bit with RAM. > > Thanks also for the link to the forum, seems of all the affected users, > no one reported success in that thread. > > > For something as simple as the GPU bus interface not responding to an access > > by the CPU I think you would get a different error (bus error) but not 100% > > sure about that. > > > > My first thought would be to see if your mobo BIOS has an option to force > > PCIE > > gen3 instead of 4 and see if that makes a difference. There are some amdgpu > > module parms > > related to PCIE as well but I'm not sure which ones to recommend. > > I'll give it a try and have a look at the pcie options - but as far as > I know RX570 (polaris) should stay at PCI3 as far as I know. > Disabling IOMMU didn't help as far as I recall. > > Thanks & best regards, Clemens ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer creation
[AMD Official Use Only - Internal Distribution Only] I see, thanks your explanation. Regards, Rico From: Koenig, Christian Sent: Saturday, March 21, 2020 16:44 To: Yin, Tianci (Rico) Cc: amd-gfx@lists.freedesktop.org ; Xu, Feifei ; Li, Pauline ; Long, Gang ; Zhang, Hawking Subject: Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer creation Correct, yes. For example if you have a 16GB VRAM Vega10 in a system with just 4GB RAM you can only allocate < 4GB VRAM (actually more like ~3GB) in a single BO. Otherwise we wouldn't be able to evacuate VRAM to system memory and disk during suspend/resume or during memory pressure. Regards, Christian. Am 21.03.2020 09:32 schrieb "Yin, Tianci (Rico)" : [AMD Official Use Only - Internal Distribution Only] Hi Christian, You mean amdgpu_bo_validate_size() return false is the expectation when GTT < request < VRAM, even if VRAM size can meet the requirement, right? Thanks! Rico From: Christian König Sent: Saturday, March 21, 2020 2:27 To: Yin, Tianci (Rico) ; amd-gfx@lists.freedesktop.org Cc: Xu, Feifei ; Li, Pauline ; Long, Gang ; Zhang, Hawking Subject: Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer creation Am 20.03.20 um 10:46 schrieb Tianci Yin: > From: "Tianci.Yin" > > [why] > When GTT domain size is smaller than VRAM, if APP apply a very large > buffer whose size is larger than GTT but smaller than VRAM, the size > validation will fail. > > [how] > Validate VRAM domain size at first place, then GTT domain. NAK, this is intended behavior. VRAM allocations larger than GTT allocations are illegal and can crash the memory management. Regards, Christian. > > Change-Id: Ic1d31b9b0a4939e6bba0241ff79ae9aa2225ee05 > Signed-off-by: Tianci.Yin > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 18 +- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index 84745f9e7408..bab134b6369f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -464,21 +464,21 @@ static bool amdgpu_bo_validate_size(struct > amdgpu_device *adev, > { >struct ttm_mem_type_manager *man = NULL; > > - /* > - * If GTT is part of requested domains the check must succeed to > - * allow fall back to GTT > - */ > - if (domain & AMDGPU_GEM_DOMAIN_GTT) { > - man = &adev->mman.bdev.man[TTM_PL_TT]; > + if (domain & AMDGPU_GEM_DOMAIN_VRAM) { > + man = &adev->mman.bdev.man[TTM_PL_VRAM]; > >if (size < (man->size << PAGE_SHIFT)) >return true; > - else > + else if (!(domain & AMDGPU_GEM_DOMAIN_GTT)) >goto fail; >} > > - if (domain & AMDGPU_GEM_DOMAIN_VRAM) { > - man = &adev->mman.bdev.man[TTM_PL_VRAM]; > + /* > + * If GTT is part of requested domains the check must succeed to > + * allow fall back to GTT > + */ > + if (domain & AMDGPU_GEM_DOMAIN_GTT) { > + man = &adev->mman.bdev.man[TTM_PL_TT]; > >if (size < (man->size << PAGE_SHIFT)) >return true; ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer creation
Correct, yes. For example if you have a 16GB VRAM Vega10 in a system with just 4GB RAM you can only allocate < 4GB VRAM (actually more like ~3GB) in a single BO. Otherwise we wouldn't be able to evacuate VRAM to system memory and disk during suspend/resume or during memory pressure. Regards, Christian. Am 21.03.2020 09:32 schrieb "Yin, Tianci (Rico)" : [AMD Official Use Only - Internal Distribution Only] Hi Christian, You mean amdgpu_bo_validate_size() return false is the expectation when GTT < request < VRAM, even if VRAM size can meet the requirement, right? Thanks! Rico From: Christian König Sent: Saturday, March 21, 2020 2:27 To: Yin, Tianci (Rico) ; amd-gfx@lists.freedesktop.org Cc: Xu, Feifei ; Li, Pauline ; Long, Gang ; Zhang, Hawking Subject: Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer creation Am 20.03.20 um 10:46 schrieb Tianci Yin: > From: "Tianci.Yin" > > [why] > When GTT domain size is smaller than VRAM, if APP apply a very large > buffer whose size is larger than GTT but smaller than VRAM, the size > validation will fail. > > [how] > Validate VRAM domain size at first place, then GTT domain. NAK, this is intended behavior. VRAM allocations larger than GTT allocations are illegal and can crash the memory management. Regards, Christian. > > Change-Id: Ic1d31b9b0a4939e6bba0241ff79ae9aa2225ee05 > Signed-off-by: Tianci.Yin > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 18 +- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index 84745f9e7408..bab134b6369f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -464,21 +464,21 @@ static bool amdgpu_bo_validate_size(struct > amdgpu_device *adev, > { >struct ttm_mem_type_manager *man = NULL; > > - /* > - * If GTT is part of requested domains the check must succeed to > - * allow fall back to GTT > - */ > - if (domain & AMDGPU_GEM_DOMAIN_GTT) { > - man = &adev->mman.bdev.man[TTM_PL_TT]; > + if (domain & AMDGPU_GEM_DOMAIN_VRAM) { > + man = &adev->mman.bdev.man[TTM_PL_VRAM]; > >if (size < (man->size << PAGE_SHIFT)) >return true; > - else > + else if (!(domain & AMDGPU_GEM_DOMAIN_GTT)) >goto fail; >} > > - if (domain & AMDGPU_GEM_DOMAIN_VRAM) { > - man = &adev->mman.bdev.man[TTM_PL_VRAM]; > + /* > + * If GTT is part of requested domains the check must succeed to > + * allow fall back to GTT > + */ > + if (domain & AMDGPU_GEM_DOMAIN_GTT) { > + man = &adev->mman.bdev.man[TTM_PL_TT]; > >if (size < (man->size << PAGE_SHIFT)) >return true; ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer creation
[AMD Official Use Only - Internal Distribution Only] Hi Christian, You mean amdgpu_bo_validate_size() return false is the expectation when GTT < request < VRAM, even if VRAM size can meet the requirement, right? Thanks! Rico From: Christian K?nig Sent: Saturday, March 21, 2020 2:27 To: Yin, Tianci (Rico) ; amd-gfx@lists.freedesktop.org Cc: Xu, Feifei ; Li, Pauline ; Long, Gang ; Zhang, Hawking Subject: Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer creation Am 20.03.20 um 10:46 schrieb Tianci Yin: > From: "Tianci.Yin" > > [why] > When GTT domain size is smaller than VRAM, if APP apply a very large > buffer whose size is larger than GTT but smaller than VRAM, the size > validation will fail. > > [how] > Validate VRAM domain size at first place, then GTT domain. NAK, this is intended behavior. VRAM allocations larger than GTT allocations are illegal and can crash the memory management. Regards, Christian. > > Change-Id: Ic1d31b9b0a4939e6bba0241ff79ae9aa2225ee05 > Signed-off-by: Tianci.Yin > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 18 +- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index 84745f9e7408..bab134b6369f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -464,21 +464,21 @@ static bool amdgpu_bo_validate_size(struct > amdgpu_device *adev, > { >struct ttm_mem_type_manager *man = NULL; > > - /* > - * If GTT is part of requested domains the check must succeed to > - * allow fall back to GTT > - */ > - if (domain & AMDGPU_GEM_DOMAIN_GTT) { > - man = &adev->mman.bdev.man[TTM_PL_TT]; > + if (domain & AMDGPU_GEM_DOMAIN_VRAM) { > + man = &adev->mman.bdev.man[TTM_PL_VRAM]; > >if (size < (man->size << PAGE_SHIFT)) >return true; > - else > + else if (!(domain & AMDGPU_GEM_DOMAIN_GTT)) >goto fail; >} > > - if (domain & AMDGPU_GEM_DOMAIN_VRAM) { > - man = &adev->mman.bdev.man[TTM_PL_VRAM]; > + /* > + * If GTT is part of requested domains the check must succeed to > + * allow fall back to GTT > + */ > + if (domain & AMDGPU_GEM_DOMAIN_GTT) { > + man = &adev->mman.bdev.man[TTM_PL_TT]; > >if (size < (man->size << PAGE_SHIFT)) >return true; ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx