Re: [PATCH hmm 5/6] mm/hmm: remove the CONFIG_TRANSPARENT_HUGEPAGE #ifdef

2020-03-21 Thread Christoph Hellwig
On Fri, Mar 20, 2020 at 01:49:04PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
> 
> This code can be compiled when CONFIG_TRANSPARENT_HUGEPAGE is off, so
> remove the ifdef.

It can compile, but will the compiler optimize it away?  Seems like
both pmd_trans_huge and pmd_devmap are stubs for
!CONFIG_TRANSPARENT_HUGEPAGE, so yes.  But that should be mentioned
in the commit message.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 4/4] mm: check the device private page owner in hmm_range_fault

2020-03-21 Thread Christoph Hellwig
On Fri, Mar 20, 2020 at 10:41:09AM -0300, Jason Gunthorpe wrote:
> Thinking about this some more, does the locking work out here?
> 
> hmm_range_fault() runs with mmap_sem in read, and does not lock any of
> the page table levels.
> 
> So it relies on accessing stale pte data being safe, and here we
> introduce for the first time a page pointer dereference and a pgmap
> dereference without any locking/refcounting.
> 
> The get_dev_pagemap() worked on the PFN and obtained a refcount, so it
> created safety.
> 
> Is there some tricky reason this is safe, eg a DEVICE_PRIVATE page
> cannot be removed from the vma without holding mmap_sem in write or
> something?

I don't think there is any specific protection.  Let me see if we
can throw in a get_dev_pagemap here - note that current mainline doesn't
even use it for this path..
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 3/4] mm: simplify device private page handling in hmm_range_fault

2020-03-21 Thread Christoph Hellwig
On Thu, Mar 19, 2020 at 09:03:45PM -0300, Jason Gunthorpe wrote:
> > Should tests enable the feature or the feature enable the test?
> > IMHO, if the feature is being compiled into the kernel, that should
> > enable the menu item for the test. If the feature isn't selected,
> > no need to test it :-)
> 
> I ment if DEVICE_PRIVATE should be a user selectable option at all, or
> should it be turned on when a driver like nouveau is selected.

I don't think it should be user selectable.  This is an implementation
detail users can't know about.

> Is there some downside to enabling DEVICE_PRIVATE?

The option itself adds a little more code to the core kernel, and
introduces a few additional branches in core mm code.

But more importantly it pulls in the whole pgmap infrastructure.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH hmm 1/6] mm/hmm: remove pgmap checking for devmap pages

2020-03-21 Thread Christoph Hellwig
On Fri, Mar 20, 2020 at 01:49:00PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
> 
> The checking boils down to some racy check if the pagemap is still
> available or not. Instead of checking this, rely entirely on the
> notifiers, if a pagemap is destroyed then all pages that belong to it must
> be removed from the tables and the notifiers triggered.
> 
> Signed-off-by: Jason Gunthorpe 

Looks good,

Reviewed-by: Christoph Hellwig 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH hmm 6/6] mm/hmm: use device_private_entry_to_pfn()

2020-03-21 Thread Christoph Hellwig
On Fri, Mar 20, 2020 at 01:49:05PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
> 
> swp_offset() should not be called directly, the wrappers are supposed to
> abstract away the encoding of the device_private specific information in
> the swap entry.
> 
> Signed-off-by: Jason Gunthorpe 
> ---
>  mm/hmm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index a09b4908e9c81a..fd9ee2b5fd9989 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -259,8 +259,8 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, 
> unsigned long addr,
>* the PFN even if not present.
>*/
>   if (hmm_is_device_private_entry(range, entry)) {
> - *pfn = hmm_device_entry_from_pfn(range,
> - swp_offset(entry));
> + *pfn = hmm_device_entry_from_pfn(
> + range, device_private_entry_to_pfn(entry));

The range parameter can stay on the first line..
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH hmm 3/6] mm/hmm: remove unused code and tidy comments

2020-03-21 Thread Christoph Hellwig
On Fri, Mar 20, 2020 at 01:49:02PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe 
> 
> Delete several functions that are never called, fix some desync between
> comments and structure content, remove an unused ret, and move one
> function only used by hmm.c into hmm.c

This looks good:

Signed-off-by: Christoph Hellwig 

Btw, the top of file comment in include/linux/hmm.h really needs some
work as well.  In fact I think it should be mostly removed with any
remaining useful bit moved to Documentation/vm/hmm.rst.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH hmm 4/6] mm/hmm: remove HMM_FAULT_SNAPSHOT

2020-03-21 Thread Christoph Hellwig
On Fri, Mar 20, 2020 at 01:49:03PM -0300, Jason Gunthorpe wrote:
> + struct hmm_range *range = hmm_vma_walk->range;
>   unsigned int required_fault = 0;
>   unsigned long i;
>  
> - if (hmm_vma_walk->flags & HMM_FAULT_SNAPSHOT)
> + /*
> +  * If there is no way for valid to be set in hmm_pte_need_fault() then
> +  * don't bother to call it.
> +  */
> + if (!(((range->flags[HMM_PFN_VALID] & range->pfn_flags_mask) |
> +range->default_flags) &
> +   range->flags[HMM_PFN_VALID]))

I think this needs to be split out into a well documented helper
function, as it is pretty much unreadable all cramed into a complex
conditional.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH hmm 2/6] mm/hmm: return the fault type from hmm_pte_need_fault()

2020-03-21 Thread Christoph Hellwig
On Fri, Mar 20, 2020 at 01:49:01PM -0300, Jason Gunthorpe wrote:
> +enum {
> + NEED_FAULT = 1 << 0,
> + NEED_WRITE_FAULT = 1 << 1,
> +};

Maybe add a HMM_ prefix?

>   for (i = 0; i < npages; ++i) {
> + required_fault |=
> + hmm_pte_need_fault(hmm_vma_walk, pfns[i], cpu_flags);
> + if (required_fault == (NEED_FAULT | NEED_WRITE_FAULT))
> + return required_fault;

No need for the inner braces.

> @@ -532,17 +515,15 @@ static int hmm_vma_walk_test(unsigned long start, 
> unsigned long end,
>*/
>   if ((vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP)) ||
>   !(vma->vm_flags & VM_READ)) {
> - bool fault, write_fault;
> -

No that there is no need for local variables I'd invert the test and
return early:

if ((vma->vm_flags & VM_READ) &&
!(vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP))
return 0;

>   /*
>* Check to see if a fault is requested for any page in the
>* range.
>*/
> - hmm_range_need_fault(hmm_vma_walk, range->pfns +
> - ((start - range->start) >> PAGE_SHIFT),
> - (end - start) >> PAGE_SHIFT,
> - 0, &fault, &write_fault);
> - if (fault || write_fault)
> + if (hmm_range_need_fault(hmm_vma_walk,
> +  range->pfns +
> +  ((start - range->start) >>
> +   PAGE_SHIFT),
> +  (end - start) >> PAGE_SHIFT, 0))

Which should help to make this a little more readable..
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Possibility of RX570 responsible for spontaneous reboots (MCE) with Ryzen 3700x?

2020-03-21 Thread Clemens Eisserer
Hi John,

> >I know RX570 (polaris) should stay at PCI3 as far as I know.
>
> Yep... thought I remembered you mentioning having a 5700XT though... is that 
> in a different system ?

I am using a RX570, the guy from reddit changed from R600 to an 5700XT
and it seems it did solve his reboot problems.

As the system is rock solid with windows-10 and others seem to
experience similar behaviour I've decided to file a bug at the
kernel's bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=206903

Thanks for your suggestions & best regards, Clemens


>
> 
> From: Clemens Eisserer 
> Sent: March 9, 2020 2:30 AM
> To: Bridgman, John ; amd-gfx@lists.freedesktop.org 
> 
> Subject: Re: Possibility of RX570 responsible for spontaneous reboots (MCE) 
> with Ryzen 3700x?
>
> Hi John,
>
> Thanks a lot for taking the time to look at this, even if it doesn't
> seem to be GPU related at first.
>
> > OK, that's a bit strange... I found mce log and MCE-Ryzen-Decoder as 
> > options for decoding.
> Sorry for omitting that information - indeed I was using
> MCE-Ryzen-Decoder, thanks for pointing to mcelog.
> The mce log output definitivly makes more sense, I'll try to
> experiment a bit with RAM.
>
> Thanks also for the link to the forum, seems of all the affected users,
> no one reported success in that thread.
>
> > For something as simple as the GPU bus interface not responding to an access
> > by the CPU I think you would get a different error (bus error) but not 100% 
> > sure about that.
> >
> > My first thought would be to see if your mobo BIOS has an option to force 
> > PCIE
> > gen3 instead of 4 and see if that makes a difference. There are some amdgpu 
> > module parms
> > related to PCIE as well but I'm not sure which ones to recommend.
>
> I'll give it a try and have a look at the pcie options - but as far as
> I know RX570 (polaris) should stay at PCI3 as far as I know.
> Disabling IOMMU didn't help as far as I recall.
>
> Thanks & best regards, Clemens
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer creation

2020-03-21 Thread Yin, Tianci (Rico)
[AMD Official Use Only - Internal Distribution Only]

I see, thanks your explanation.

Regards,
Rico

From: Koenig, Christian 
Sent: Saturday, March 21, 2020 16:44
To: Yin, Tianci (Rico) 
Cc: amd-gfx@lists.freedesktop.org ; Xu, Feifei 
; Li, Pauline ; Long, Gang 
; Zhang, Hawking 
Subject: Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer 
creation

Correct, yes.

For example if you have a 16GB VRAM Vega10 in a system with just 4GB RAM you 
can only allocate < 4GB VRAM (actually more like ~3GB) in a single BO.

Otherwise we wouldn't be able to evacuate VRAM to system memory and disk during 
suspend/resume or during memory pressure.

Regards,
Christian.

Am 21.03.2020 09:32 schrieb "Yin, Tianci (Rico)" :

[AMD Official Use Only - Internal Distribution Only]

Hi Christian,
You mean amdgpu_bo_validate_size() return false is the expectation when GTT < 
request < VRAM, even if VRAM size can meet the requirement, right?

Thanks!
Rico

From: Christian König 
Sent: Saturday, March 21, 2020 2:27
To: Yin, Tianci (Rico) ; amd-gfx@lists.freedesktop.org 

Cc: Xu, Feifei ; Li, Pauline ; Long, 
Gang ; Zhang, Hawking 
Subject: Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer 
creation

Am 20.03.20 um 10:46 schrieb Tianci Yin:
> From: "Tianci.Yin" 
>
> [why]
> When GTT domain size is smaller than VRAM, if APP apply a very large
> buffer whose size is larger than GTT but smaller than VRAM, the size
> validation will fail.
>
> [how]
> Validate VRAM domain size at first place, then GTT domain.

NAK, this is intended behavior. VRAM allocations larger than GTT
allocations are illegal and can crash the memory management.

Regards,
Christian.

>
> Change-Id: Ic1d31b9b0a4939e6bba0241ff79ae9aa2225ee05
> Signed-off-by: Tianci.Yin 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 18 +-
>   1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 84745f9e7408..bab134b6369f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -464,21 +464,21 @@ static bool amdgpu_bo_validate_size(struct 
> amdgpu_device *adev,
>   {
>struct ttm_mem_type_manager *man = NULL;
>
> - /*
> -  * If GTT is part of requested domains the check must succeed to
> -  * allow fall back to GTT
> -  */
> - if (domain & AMDGPU_GEM_DOMAIN_GTT) {
> - man = &adev->mman.bdev.man[TTM_PL_TT];
> + if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
> + man = &adev->mman.bdev.man[TTM_PL_VRAM];
>
>if (size < (man->size << PAGE_SHIFT))
>return true;
> - else
> + else if (!(domain & AMDGPU_GEM_DOMAIN_GTT))
>goto fail;
>}
>
> - if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
> - man = &adev->mman.bdev.man[TTM_PL_VRAM];
> + /*
> +  * If GTT is part of requested domains the check must succeed to
> +  * allow fall back to GTT
> +  */
> + if (domain & AMDGPU_GEM_DOMAIN_GTT) {
> + man = &adev->mman.bdev.man[TTM_PL_TT];
>
>if (size < (man->size << PAGE_SHIFT))
>return true;

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer creation

2020-03-21 Thread Koenig, Christian
Correct, yes.

For example if you have a 16GB VRAM Vega10 in a system with just 4GB RAM you 
can only allocate < 4GB VRAM (actually more like ~3GB) in a single BO.

Otherwise we wouldn't be able to evacuate VRAM to system memory and disk during 
suspend/resume or during memory pressure.

Regards,
Christian.

Am 21.03.2020 09:32 schrieb "Yin, Tianci (Rico)" :

[AMD Official Use Only - Internal Distribution Only]

Hi Christian,
You mean amdgpu_bo_validate_size() return false is the expectation when GTT < 
request < VRAM, even if VRAM size can meet the requirement, right?

Thanks!
Rico

From: Christian König 
Sent: Saturday, March 21, 2020 2:27
To: Yin, Tianci (Rico) ; amd-gfx@lists.freedesktop.org 

Cc: Xu, Feifei ; Li, Pauline ; Long, 
Gang ; Zhang, Hawking 
Subject: Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer 
creation

Am 20.03.20 um 10:46 schrieb Tianci Yin:
> From: "Tianci.Yin" 
>
> [why]
> When GTT domain size is smaller than VRAM, if APP apply a very large
> buffer whose size is larger than GTT but smaller than VRAM, the size
> validation will fail.
>
> [how]
> Validate VRAM domain size at first place, then GTT domain.

NAK, this is intended behavior. VRAM allocations larger than GTT
allocations are illegal and can crash the memory management.

Regards,
Christian.

>
> Change-Id: Ic1d31b9b0a4939e6bba0241ff79ae9aa2225ee05
> Signed-off-by: Tianci.Yin 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 18 +-
>   1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 84745f9e7408..bab134b6369f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -464,21 +464,21 @@ static bool amdgpu_bo_validate_size(struct 
> amdgpu_device *adev,
>   {
>struct ttm_mem_type_manager *man = NULL;
>
> - /*
> -  * If GTT is part of requested domains the check must succeed to
> -  * allow fall back to GTT
> -  */
> - if (domain & AMDGPU_GEM_DOMAIN_GTT) {
> - man = &adev->mman.bdev.man[TTM_PL_TT];
> + if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
> + man = &adev->mman.bdev.man[TTM_PL_VRAM];
>
>if (size < (man->size << PAGE_SHIFT))
>return true;
> - else
> + else if (!(domain & AMDGPU_GEM_DOMAIN_GTT))
>goto fail;
>}
>
> - if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
> - man = &adev->mman.bdev.man[TTM_PL_VRAM];
> + /*
> +  * If GTT is part of requested domains the check must succeed to
> +  * allow fall back to GTT
> +  */
> + if (domain & AMDGPU_GEM_DOMAIN_GTT) {
> + man = &adev->mman.bdev.man[TTM_PL_TT];
>
>if (size < (man->size << PAGE_SHIFT))
>return true;

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer creation

2020-03-21 Thread Yin, Tianci (Rico)
[AMD Official Use Only - Internal Distribution Only]

Hi Christian,
You mean amdgpu_bo_validate_size() return false is the expectation when GTT < 
request < VRAM, even if VRAM size can meet the requirement, right?

Thanks!
Rico

From: Christian K?nig 
Sent: Saturday, March 21, 2020 2:27
To: Yin, Tianci (Rico) ; amd-gfx@lists.freedesktop.org 

Cc: Xu, Feifei ; Li, Pauline ; Long, 
Gang ; Zhang, Hawking 
Subject: Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer 
creation

Am 20.03.20 um 10:46 schrieb Tianci Yin:
> From: "Tianci.Yin" 
>
> [why]
> When GTT domain size is smaller than VRAM, if APP apply a very large
> buffer whose size is larger than GTT but smaller than VRAM, the size
> validation will fail.
>
> [how]
> Validate VRAM domain size at first place, then GTT domain.

NAK, this is intended behavior. VRAM allocations larger than GTT
allocations are illegal and can crash the memory management.

Regards,
Christian.

>
> Change-Id: Ic1d31b9b0a4939e6bba0241ff79ae9aa2225ee05
> Signed-off-by: Tianci.Yin 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 18 +-
>   1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 84745f9e7408..bab134b6369f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -464,21 +464,21 @@ static bool amdgpu_bo_validate_size(struct 
> amdgpu_device *adev,
>   {
>struct ttm_mem_type_manager *man = NULL;
>
> - /*
> -  * If GTT is part of requested domains the check must succeed to
> -  * allow fall back to GTT
> -  */
> - if (domain & AMDGPU_GEM_DOMAIN_GTT) {
> - man = &adev->mman.bdev.man[TTM_PL_TT];
> + if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
> + man = &adev->mman.bdev.man[TTM_PL_VRAM];
>
>if (size < (man->size << PAGE_SHIFT))
>return true;
> - else
> + else if (!(domain & AMDGPU_GEM_DOMAIN_GTT))
>goto fail;
>}
>
> - if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
> - man = &adev->mman.bdev.man[TTM_PL_VRAM];
> + /*
> +  * If GTT is part of requested domains the check must succeed to
> +  * allow fall back to GTT
> +  */
> + if (domain & AMDGPU_GEM_DOMAIN_GTT) {
> + man = &adev->mman.bdev.man[TTM_PL_TT];
>
>if (size < (man->size << PAGE_SHIFT))
>return true;

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx