Re: [Nouveau] [PATCH] mm/migrate.c: Remove MIGRATE_PFN_LOCKED

2021-10-25 Thread Ralph Campbell



On 10/24/21 21:16, Alistair Popple wrote:

MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a
source page was already locked during migrate_vma_collect(). If it
wasn't then the a second attempt is made to lock the page. However if
the first attempt failed it's unlikely a second attempt will succeed,
and the retry adds complexity. So clean this up by removing the retry
and MIGRATE_PFN_LOCKED flag.

Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag
set, but nothing actually checks that.

Signed-off-by: Alistair Popple 


You can add:
Reviewed-by: Ralph Campbell 


---
  Documentation/vm/hmm.rst |   2 +-
  arch/powerpc/kvm/book3s_hv_uvmem.c   |   4 +-
  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |   2 -
  drivers/gpu/drm/nouveau/nouveau_dmem.c   |   4 +-
  include/linux/migrate.h  |   1 -
  lib/test_hmm.c   |   5 +-
  mm/migrate.c | 145 +--
  7 files changed, 35 insertions(+), 128 deletions(-)

diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
index a14c2938e7af..f2a59ed82ed3 100644
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -360,7 +360,7 @@ between device driver specific code and shared common code:
 system memory page, locks the page with ``lock_page()``, and fills in the
 ``dst`` array entry with::
  
- dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;

+ dst[i] = migrate_pfn(page_to_pfn(dpage));
  
 Now that the driver knows that this page is being migrated, it can

 invalidate device private MMU mappings and copy device private memory
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index a7061ee3b157..28c436df9935 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -560,7 +560,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
  gpa, 0, page_shift);
  
  	if (ret == U_SUCCESS)

-   *mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED;
+   *mig.dst = migrate_pfn(pfn);
else {
unlock_page(dpage);
__free_page(dpage);
@@ -774,7 +774,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
}
}
  
-	*mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;

+   *mig.dst = migrate_pfn(page_to_pfn(dpage));
migrate_vma_pages();
  out_finalize:
migrate_vma_finalize();
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 4a16e3c257b9..41d9417f182b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -300,7 +300,6 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
migrate->dst[i] = svm_migrate_addr_to_pfn(adev, dst[i]);
svm_migrate_get_vram_page(prange, migrate->dst[i]);
migrate->dst[i] = migrate_pfn(migrate->dst[i]);
-   migrate->dst[i] |= MIGRATE_PFN_LOCKED;
src[i] = dma_map_page(dev, spage, 0, PAGE_SIZE,
  DMA_TO_DEVICE);
r = dma_mapping_error(dev, src[i]);
@@ -580,7 +579,6 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct 
svm_range *prange,
  dst[i] >> PAGE_SHIFT, page_to_pfn(dpage));
  
  		migrate->dst[i] = migrate_pfn(page_to_pfn(dpage));

-   migrate->dst[i] |= MIGRATE_PFN_LOCKED;
j++;
}
  
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c

index 92987daa5e17..3828aafd3ac4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -166,7 +166,7 @@ static vm_fault_t nouveau_dmem_fault_copy_one(struct 
nouveau_drm *drm,
goto error_dma_unmap;
mutex_unlock(>mutex);
  
-	args->dst[0] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;

+   args->dst[0] = migrate_pfn(page_to_pfn(dpage));
return 0;
  
  error_dma_unmap:

@@ -602,7 +602,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct 
nouveau_drm *drm,
((paddr >> PAGE_SHIFT) << NVIF_VMM_PFNMAP_V0_ADDR_SHIFT);
if (src & MIGRATE_PFN_WRITE)
*pfn |= NVIF_VMM_PFNMAP_V0_W;
-   return migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+   return migrate_pfn(page_to_pfn(dpage));
  
  out_dma_unmap:

dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index c8077e936691..479b861ae490 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -119,7 +119,6 @@ static inline int migrate_misplaced_page(struct page *page,
   */
  #define MIGRATE_PFN_VALID 

[Nouveau] System freeze: Debian Stable with C61 [GeForce 7025 / nForce 630a]

2021-10-25 Thread riveravaldez
Hi, this may be an old issue.
I copied at the bottom the last message of a previous solution for
this same machine.
Essentially I have random freezes (display image seems like a snow storm or just
gets frozen), nothing works, I'm forced to hard reset the machine.
Last time the solution was just to remove nouveau_dri.so, and for what
seems to be
a couple of years system worked rock-solid.
But a couple of days ago I did a system update (I'm on Debian Stable
Bullseye right
now) and apparently the problem reappeared but worse/different: now it
happens even
without nouveau_dri.so present on the system. (Meaning: I remove
nouveau_dri.so and
the freezes happen randomly anyway.)
The hardware is the same, so, I'm imaging maybe is a kernel issue?
This is the hardware:

$ lspci
00:00.0 RAM memory: NVIDIA Corporation MCP61 Host Bridge (rev a1)
00:01.0 ISA bridge: NVIDIA Corporation MCP61 LPC Bridge (rev a2)
00:01.1 SMBus: NVIDIA Corporation MCP61 SMBus (rev a2)
00:01.2 RAM memory: NVIDIA Corporation MCP61 Memory Controller (rev a2)
00:02.0 USB controller: NVIDIA Corporation MCP61 USB 1.1 Controller (rev a3)
00:02.1 USB controller: NVIDIA Corporation MCP61 USB 2.0 Controller (rev a3)
00:04.0 PCI bridge: NVIDIA Corporation MCP61 PCI bridge (rev a1)
00:05.0 Audio device: NVIDIA Corporation MCP61 High Definition Audio (rev a2)
00:06.0 IDE interface: NVIDIA Corporation MCP61 IDE (rev a2)
00:07.0 Bridge: NVIDIA Corporation MCP61 Ethernet (rev a2)
00:08.0 IDE interface: NVIDIA Corporation MCP61 SATA Controller (rev a2)
00:08.1 IDE interface: NVIDIA Corporation MCP61 SATA Controller (rev a2)
00:0d.0 VGA compatible controller: NVIDIA Corporation C61 [GeForce
7025 / nForce 630a] (rev a2)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h
Processor HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h
Processor Address Map
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h
Processor DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h
Processor Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h
Processor Link Control

With:

$ uname -a
Linux debian 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30)
x86_64 GNU/Linux

And:

$ apt-cache policy xserver-xorg-video-nouveau
xserver-xorg-video-nouveau:
  Instalados: 1:1.0.17-1
  Candidato:  1:1.0.17-1
  Tabla de versión:
 *** 1:1.0.17-1 500
500 https://deb.debian.org/debian bullseye/main amd64 Packages
100 /var/lib/dpkg/status

And this seems to be all the info I have:

$ sudo journalctl -S 2021-10-21 -x -p 4 | grep nouveau
oct 21 13:16:43 debian kernel: nouveau :00:0d.0: DRM: DCB type 4 not known
oct 21 13:16:43 debian kernel: nouveau :00:0d.0: DRM: Unknown-1
has no encoders, removing
oct 21 13:17:00 debian kernel: nouveau :00:0d.0: bus: MMIO write
of 005c0001 FAULT at 00b000
oct 21 17:12:05 debian kernel: nouveau :00:0d.0: DRM: DCB type 4 not known
oct 21 17:12:05 debian kernel: nouveau :00:0d.0: DRM: Unknown-1
has no encoders, removing
oct 21 17:12:18 debian kernel: nouveau :00:0d.0: bus: MMIO write
of 005c0001 FAULT at 00b000
oct 21 17:22:21 debian kernel: nouveau :00:0d.0: bus: MMIO write
of 014b0001 FAULT at 00b010
oct 21 17:22:21 debian kernel: nouveau :00:0d.0: bus: MMIO write
of 014b0001 FAULT at 00b010
oct 21 21:32:55 debian kernel:  autofs4 ext4 crc16 mbcache jbd2
crc32c_generic sd_mod t10_pi crc_t10dif crct10dif_generic
crct10dif_common nouveau video mxm_wmi wmi i2c_algo_bit drm_kms_helper
cec ttm ata_generic sata_nv drm libata scsi_mod psmouse serio_raw
evdev button
oct 21 22:22:53 debian kernel: nouveau :00:0d.0: bus: MMIO write
of 01f20001 FAULT at 00b020
oct 21 22:23:14 debian kernel: nouveau :00:0d.0: bus: MMIO write
of  FAULT at 00b020
oct 21 22:23:23 debian kernel: nouveau :00:0d.0: bus: MMIO write
of 01f20001 FAULT at 00b020
oct 21 22:25:32 debian kernel: nouveau :00:0d.0: DRM: DCB type 4 not known
oct 21 22:25:32 debian kernel: nouveau :00:0d.0: DRM: Unknown-1
has no encoders, removing
oct 21 22:26:03 debian kernel: nouveau :00:0d.0: bus: MMIO write
of 005c0001 FAULT at 00b000
oct 21 22:46:40 debian kernel: nouveau :00:0d.0: DRM: DCB type 4 not known
oct 21 22:46:40 debian kernel: nouveau :00:0d.0: DRM: Unknown-1
has no encoders, removing
oct 21 22:46:51 debian kernel: nouveau :00:0d.0: bus: MMIO write
of 005c0001 FAULT at 00b000
oct 21 22:59:55 debian kernel:  crct10dif_common nouveau video mxm_wmi
wmi i2c_algo_bit drm_kms_helper ata_generic sata_nv cec libata ttm drm
scsi_mod psmouse evdev serio_raw button
oct 21 23:44:26 debian kernel: nouveau :00:0d.0: bus: MMIO write
of 01570001 FAULT at 00b010
oct 21 23:44:26 debian kernel: nouveau :00:0d.0: bus: MMIO write
of 01ef0001 FAULT at 00b020
oct 21 23:45:08 debian kernel: nouveau :00:0d.0: bus: MMIO write
of 02550001 FAULT at 00b030
oct 21 23:45:09 debian kernel: nouveau 

[Nouveau] [PATCH] mm/migrate.c: Remove MIGRATE_PFN_LOCKED

2021-10-25 Thread Alistair Popple
MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a
source page was already locked during migrate_vma_collect(). If it
wasn't then the a second attempt is made to lock the page. However if
the first attempt failed it's unlikely a second attempt will succeed,
and the retry adds complexity. So clean this up by removing the retry
and MIGRATE_PFN_LOCKED flag.

Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag
set, but nothing actually checks that.

Signed-off-by: Alistair Popple 
---
 Documentation/vm/hmm.rst |   2 +-
 arch/powerpc/kvm/book3s_hv_uvmem.c   |   4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |   2 -
 drivers/gpu/drm/nouveau/nouveau_dmem.c   |   4 +-
 include/linux/migrate.h  |   1 -
 lib/test_hmm.c   |   5 +-
 mm/migrate.c | 145 +--
 7 files changed, 35 insertions(+), 128 deletions(-)

diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
index a14c2938e7af..f2a59ed82ed3 100644
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -360,7 +360,7 @@ between device driver specific code and shared common code:
system memory page, locks the page with ``lock_page()``, and fills in the
``dst`` array entry with::
 
- dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+ dst[i] = migrate_pfn(page_to_pfn(dpage));
 
Now that the driver knows that this page is being migrated, it can
invalidate device private MMU mappings and copy device private memory
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c 
b/arch/powerpc/kvm/book3s_hv_uvmem.c
index a7061ee3b157..28c436df9935 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -560,7 +560,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
  gpa, 0, page_shift);
 
if (ret == U_SUCCESS)
-   *mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED;
+   *mig.dst = migrate_pfn(pfn);
else {
unlock_page(dpage);
__free_page(dpage);
@@ -774,7 +774,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
}
}
 
-   *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+   *mig.dst = migrate_pfn(page_to_pfn(dpage));
migrate_vma_pages();
 out_finalize:
migrate_vma_finalize();
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 4a16e3c257b9..41d9417f182b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -300,7 +300,6 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct 
svm_range *prange,
migrate->dst[i] = svm_migrate_addr_to_pfn(adev, dst[i]);
svm_migrate_get_vram_page(prange, migrate->dst[i]);
migrate->dst[i] = migrate_pfn(migrate->dst[i]);
-   migrate->dst[i] |= MIGRATE_PFN_LOCKED;
src[i] = dma_map_page(dev, spage, 0, PAGE_SIZE,
  DMA_TO_DEVICE);
r = dma_mapping_error(dev, src[i]);
@@ -580,7 +579,6 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct 
svm_range *prange,
  dst[i] >> PAGE_SHIFT, page_to_pfn(dpage));
 
migrate->dst[i] = migrate_pfn(page_to_pfn(dpage));
-   migrate->dst[i] |= MIGRATE_PFN_LOCKED;
j++;
}
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 92987daa5e17..3828aafd3ac4 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -166,7 +166,7 @@ static vm_fault_t nouveau_dmem_fault_copy_one(struct 
nouveau_drm *drm,
goto error_dma_unmap;
mutex_unlock(>mutex);
 
-   args->dst[0] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+   args->dst[0] = migrate_pfn(page_to_pfn(dpage));
return 0;
 
 error_dma_unmap:
@@ -602,7 +602,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct 
nouveau_drm *drm,
((paddr >> PAGE_SHIFT) << NVIF_VMM_PFNMAP_V0_ADDR_SHIFT);
if (src & MIGRATE_PFN_WRITE)
*pfn |= NVIF_VMM_PFNMAP_V0_W;
-   return migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+   return migrate_pfn(page_to_pfn(dpage));
 
 out_dma_unmap:
dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index c8077e936691..479b861ae490 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -119,7 +119,6 @@ static inline int migrate_misplaced_page(struct page *page,
  */
 #define MIGRATE_PFN_VALID  (1UL << 0)
 #define MIGRATE_PFN_MIGRATE(1UL << 1)
-#define MIGRATE_PFN_LOCKED (1UL << 2)
 

Re: [Nouveau] [Intel-gfx] [PATCH v3 13/13] drm/i915: replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi

2021-10-25 Thread Claudio Suarez
On Fri, Oct 22, 2021 at 03:22:57PM +0300, Ville Syrjälä wrote:
> On Fri, Oct 22, 2021 at 03:01:52PM +0300, Ville Syrjälä wrote:
> > On Fri, Oct 22, 2021 at 12:25:33PM +0200, Claudio Suarez wrote:
> > > On Thu, Oct 21, 2021 at 04:49:59PM +0300, Ville Syrjälä wrote:
> > > > On Wed, Oct 20, 2021 at 12:51:21AM +0200, Claudio Suarez wrote:
> > > > > drm_get_edid() internally calls to 
> > > > > drm_connector_update_edid_property()
> > > > > and then drm_add_display_info(), which parses the EDID.
> > > > > This happens in the function intel_hdmi_set_edid() and
> > > > > intel_sdvo_tmds_sink_detect() (via intel_sdvo_get_edid()).
> > > > > 
> > > > > Once EDID is parsed, the monitor HDMI support information is available
> > > > > through drm_display_info.is_hdmi. Retriving the same information with
> > > > > drm_detect_hdmi_monitor() is less efficient. Change to
> > > > > drm_display_info.is_hdmi
> > > > 
> > > > I meant we need to examine all call chains that can lead to
> > > > .detect() to make sure all of them do in fact update the
> > > > display_info beforehand.
> > > 
> > > Well, I studied it carefully and, yes, all call chains that can lead to
> > > drm_display_info.is_hdmi / drm_detect_hdmi_monitor() update display_info
> > > beforehand. In the case that this doesn't happen, the code is unchanged.
> > > 
> > > Do you want I explain the changes in the code here again ? Or do you want
> > > to me change the commit message to be more clear ? In the first case, I 
> > > can
> > > write here a detailed explanation. In the second case I can make a longer 
> > > commit
> > > message.
> > > 
> > > Or both?
> > 
> > I want all those call chains explained in the commit message,
> > otherwise I have no easy way to confirm whether the change
> > is correct or not.
> 
> Hmm. OK, so I had a bit of a dig around and seems that what we do now
> .detect()->drm_get_edid()->drm_connector_update_edid_property()->drm_add_display_info()

Yes. I said before that I felt something was wrong when I read the
documentation and then the code. To be more explicit now, I expected that
drm_connector_update_edid_property() will be done in the
fill_modes/get_modes phase instead of when reading the edid.
The documentation suggests that but the code reads the edid in the
detect phase.
Now, since drm_connector_update_edid_property() is called in the detect
phase, it is not necessary to keep the edid data in the private connector
struct. It is in struct drm_connector from the beginning.
But this is topic for another patch.

> Now the question is when did that start happening? Looks like it was
> commit 4b4df570b41d ("drm: Update edid-derived drm_display_info fields
> at edid property set [v2]") that started to call drm_add_display_info()
> from drm_connector_update_edid_property(), and then commit 5186421cbfe2
> ("drm: Introduce epoch counter to drm_connector") started to call
> drm_connector_update_edid_property() from drm_get_edid(). Before both
> of those commits were in place display_info would still contain
> some stale garbage during .detect().
>
> That is the story I think we want in these commit messages since it
> a) explains why the old code was directly parsing the edid instead
> b) why it's now safe to change this

--commit-message?

drm/i915: replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi

Commit a92d083d08b0 created the new flag is_hdmi in drm_display_info
which is set when sink compliant with CEA-861 (EDID) shall be treated
as an HDMI sink.

>From that day, this value can be used in some cases instead of
calling drm_detect_hdmi_monitor() and a second parse is avoided
because drm_detect_hdmi_monitor() parses. A TODO task was
registered in Documentation/gpu/todo.rst to perform that task in
the future.

The flag drm_display_info.is_hdmi is set in the function
drm_add_display_info(), which is called from
drm_connector_update_edid_property(). Since commit 5186421cbfe2,
drm_get_edid() calls drm_connector_update_edid_property() when
reading the edid data from an i2c adapter. Therefore, in these
cases drm_display_info.is_hdmi is updated to its correct
value when returning from drm_get_edid().

Replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi
in the cases when drm_detect_hdmi_monitor() is called after a
read from an i2c adapter using drm_get_edid() in the i915 driver.
---

> 
> PS. connector->force handling in drm_get_edid() looks a bit busted
> since it doesn't call drm_connector_update_edid_property() at all
> in some cases. I think there might be some path that leads there
> anywya if/when we change connector->force, but we should fix
> drm_get_edid() to do the right thing regarless.

In those cases, the edid isn't read and NULL is returned by drm_get_edid().
No problem because display_info.is_hdmi is inside an if (edid != NULL).

BTW, struct intel_connector is allocated with kzalloc, so the initial
value of is_hdmi is