Re: [Nouveau] [PATCH] mm/migrate.c: Remove MIGRATE_PFN_LOCKED
On 10/24/21 21:16, Alistair Popple wrote: MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a source page was already locked during migrate_vma_collect(). If it wasn't then the a second attempt is made to lock the page. However if the first attempt failed it's unlikely a second attempt will succeed, and the retry adds complexity. So clean this up by removing the retry and MIGRATE_PFN_LOCKED flag. Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag set, but nothing actually checks that. Signed-off-by: Alistair Popple You can add: Reviewed-by: Ralph Campbell --- Documentation/vm/hmm.rst | 2 +- arch/powerpc/kvm/book3s_hv_uvmem.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 - drivers/gpu/drm/nouveau/nouveau_dmem.c | 4 +- include/linux/migrate.h | 1 - lib/test_hmm.c | 5 +- mm/migrate.c | 145 +-- 7 files changed, 35 insertions(+), 128 deletions(-) diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst index a14c2938e7af..f2a59ed82ed3 100644 --- a/Documentation/vm/hmm.rst +++ b/Documentation/vm/hmm.rst @@ -360,7 +360,7 @@ between device driver specific code and shared common code: system memory page, locks the page with ``lock_page()``, and fills in the ``dst`` array entry with:: - dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; + dst[i] = migrate_pfn(page_to_pfn(dpage)); Now that the driver knows that this page is being migrated, it can invalidate device private MMU mappings and copy device private memory diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index a7061ee3b157..28c436df9935 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -560,7 +560,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma, gpa, 0, page_shift); if (ret == U_SUCCESS) - *mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED; + *mig.dst = migrate_pfn(pfn); else { unlock_page(dpage); __free_page(dpage); @@ -774,7 +774,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, } } - *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; + *mig.dst = migrate_pfn(page_to_pfn(dpage)); migrate_vma_pages(); out_finalize: migrate_vma_finalize(); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 4a16e3c257b9..41d9417f182b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -300,7 +300,6 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct svm_range *prange, migrate->dst[i] = svm_migrate_addr_to_pfn(adev, dst[i]); svm_migrate_get_vram_page(prange, migrate->dst[i]); migrate->dst[i] = migrate_pfn(migrate->dst[i]); - migrate->dst[i] |= MIGRATE_PFN_LOCKED; src[i] = dma_map_page(dev, spage, 0, PAGE_SIZE, DMA_TO_DEVICE); r = dma_mapping_error(dev, src[i]); @@ -580,7 +579,6 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct svm_range *prange, dst[i] >> PAGE_SHIFT, page_to_pfn(dpage)); migrate->dst[i] = migrate_pfn(page_to_pfn(dpage)); - migrate->dst[i] |= MIGRATE_PFN_LOCKED; j++; } diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 92987daa5e17..3828aafd3ac4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -166,7 +166,7 @@ static vm_fault_t nouveau_dmem_fault_copy_one(struct nouveau_drm *drm, goto error_dma_unmap; mutex_unlock(>mutex); - args->dst[0] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; + args->dst[0] = migrate_pfn(page_to_pfn(dpage)); return 0; error_dma_unmap: @@ -602,7 +602,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm, ((paddr >> PAGE_SHIFT) << NVIF_VMM_PFNMAP_V0_ADDR_SHIFT); if (src & MIGRATE_PFN_WRITE) *pfn |= NVIF_VMM_PFNMAP_V0_W; - return migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; + return migrate_pfn(page_to_pfn(dpage)); out_dma_unmap: dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); diff --git a/include/linux/migrate.h b/include/linux/migrate.h index c8077e936691..479b861ae490 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -119,7 +119,6 @@ static inline int migrate_misplaced_page(struct page *page, */ #define MIGRATE_PFN_VALID
[Nouveau] System freeze: Debian Stable with C61 [GeForce 7025 / nForce 630a]
Hi, this may be an old issue. I copied at the bottom the last message of a previous solution for this same machine. Essentially I have random freezes (display image seems like a snow storm or just gets frozen), nothing works, I'm forced to hard reset the machine. Last time the solution was just to remove nouveau_dri.so, and for what seems to be a couple of years system worked rock-solid. But a couple of days ago I did a system update (I'm on Debian Stable Bullseye right now) and apparently the problem reappeared but worse/different: now it happens even without nouveau_dri.so present on the system. (Meaning: I remove nouveau_dri.so and the freezes happen randomly anyway.) The hardware is the same, so, I'm imaging maybe is a kernel issue? This is the hardware: $ lspci 00:00.0 RAM memory: NVIDIA Corporation MCP61 Host Bridge (rev a1) 00:01.0 ISA bridge: NVIDIA Corporation MCP61 LPC Bridge (rev a2) 00:01.1 SMBus: NVIDIA Corporation MCP61 SMBus (rev a2) 00:01.2 RAM memory: NVIDIA Corporation MCP61 Memory Controller (rev a2) 00:02.0 USB controller: NVIDIA Corporation MCP61 USB 1.1 Controller (rev a3) 00:02.1 USB controller: NVIDIA Corporation MCP61 USB 2.0 Controller (rev a3) 00:04.0 PCI bridge: NVIDIA Corporation MCP61 PCI bridge (rev a1) 00:05.0 Audio device: NVIDIA Corporation MCP61 High Definition Audio (rev a2) 00:06.0 IDE interface: NVIDIA Corporation MCP61 IDE (rev a2) 00:07.0 Bridge: NVIDIA Corporation MCP61 Ethernet (rev a2) 00:08.0 IDE interface: NVIDIA Corporation MCP61 SATA Controller (rev a2) 00:08.1 IDE interface: NVIDIA Corporation MCP61 SATA Controller (rev a2) 00:0d.0 VGA compatible controller: NVIDIA Corporation C61 [GeForce 7025 / nForce 630a] (rev a2) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor HyperTransport Configuration 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Address Map 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Miscellaneous Control 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Link Control With: $ uname -a Linux debian 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64 GNU/Linux And: $ apt-cache policy xserver-xorg-video-nouveau xserver-xorg-video-nouveau: Instalados: 1:1.0.17-1 Candidato: 1:1.0.17-1 Tabla de versión: *** 1:1.0.17-1 500 500 https://deb.debian.org/debian bullseye/main amd64 Packages 100 /var/lib/dpkg/status And this seems to be all the info I have: $ sudo journalctl -S 2021-10-21 -x -p 4 | grep nouveau oct 21 13:16:43 debian kernel: nouveau :00:0d.0: DRM: DCB type 4 not known oct 21 13:16:43 debian kernel: nouveau :00:0d.0: DRM: Unknown-1 has no encoders, removing oct 21 13:17:00 debian kernel: nouveau :00:0d.0: bus: MMIO write of 005c0001 FAULT at 00b000 oct 21 17:12:05 debian kernel: nouveau :00:0d.0: DRM: DCB type 4 not known oct 21 17:12:05 debian kernel: nouveau :00:0d.0: DRM: Unknown-1 has no encoders, removing oct 21 17:12:18 debian kernel: nouveau :00:0d.0: bus: MMIO write of 005c0001 FAULT at 00b000 oct 21 17:22:21 debian kernel: nouveau :00:0d.0: bus: MMIO write of 014b0001 FAULT at 00b010 oct 21 17:22:21 debian kernel: nouveau :00:0d.0: bus: MMIO write of 014b0001 FAULT at 00b010 oct 21 21:32:55 debian kernel: autofs4 ext4 crc16 mbcache jbd2 crc32c_generic sd_mod t10_pi crc_t10dif crct10dif_generic crct10dif_common nouveau video mxm_wmi wmi i2c_algo_bit drm_kms_helper cec ttm ata_generic sata_nv drm libata scsi_mod psmouse serio_raw evdev button oct 21 22:22:53 debian kernel: nouveau :00:0d.0: bus: MMIO write of 01f20001 FAULT at 00b020 oct 21 22:23:14 debian kernel: nouveau :00:0d.0: bus: MMIO write of FAULT at 00b020 oct 21 22:23:23 debian kernel: nouveau :00:0d.0: bus: MMIO write of 01f20001 FAULT at 00b020 oct 21 22:25:32 debian kernel: nouveau :00:0d.0: DRM: DCB type 4 not known oct 21 22:25:32 debian kernel: nouveau :00:0d.0: DRM: Unknown-1 has no encoders, removing oct 21 22:26:03 debian kernel: nouveau :00:0d.0: bus: MMIO write of 005c0001 FAULT at 00b000 oct 21 22:46:40 debian kernel: nouveau :00:0d.0: DRM: DCB type 4 not known oct 21 22:46:40 debian kernel: nouveau :00:0d.0: DRM: Unknown-1 has no encoders, removing oct 21 22:46:51 debian kernel: nouveau :00:0d.0: bus: MMIO write of 005c0001 FAULT at 00b000 oct 21 22:59:55 debian kernel: crct10dif_common nouveau video mxm_wmi wmi i2c_algo_bit drm_kms_helper ata_generic sata_nv cec libata ttm drm scsi_mod psmouse evdev serio_raw button oct 21 23:44:26 debian kernel: nouveau :00:0d.0: bus: MMIO write of 01570001 FAULT at 00b010 oct 21 23:44:26 debian kernel: nouveau :00:0d.0: bus: MMIO write of 01ef0001 FAULT at 00b020 oct 21 23:45:08 debian kernel: nouveau :00:0d.0: bus: MMIO write of 02550001 FAULT at 00b030 oct 21 23:45:09 debian kernel: nouveau
[Nouveau] [PATCH] mm/migrate.c: Remove MIGRATE_PFN_LOCKED
MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a source page was already locked during migrate_vma_collect(). If it wasn't then the a second attempt is made to lock the page. However if the first attempt failed it's unlikely a second attempt will succeed, and the retry adds complexity. So clean this up by removing the retry and MIGRATE_PFN_LOCKED flag. Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag set, but nothing actually checks that. Signed-off-by: Alistair Popple --- Documentation/vm/hmm.rst | 2 +- arch/powerpc/kvm/book3s_hv_uvmem.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 - drivers/gpu/drm/nouveau/nouveau_dmem.c | 4 +- include/linux/migrate.h | 1 - lib/test_hmm.c | 5 +- mm/migrate.c | 145 +-- 7 files changed, 35 insertions(+), 128 deletions(-) diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst index a14c2938e7af..f2a59ed82ed3 100644 --- a/Documentation/vm/hmm.rst +++ b/Documentation/vm/hmm.rst @@ -360,7 +360,7 @@ between device driver specific code and shared common code: system memory page, locks the page with ``lock_page()``, and fills in the ``dst`` array entry with:: - dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; + dst[i] = migrate_pfn(page_to_pfn(dpage)); Now that the driver knows that this page is being migrated, it can invalidate device private MMU mappings and copy device private memory diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c index a7061ee3b157..28c436df9935 100644 --- a/arch/powerpc/kvm/book3s_hv_uvmem.c +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c @@ -560,7 +560,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma, gpa, 0, page_shift); if (ret == U_SUCCESS) - *mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED; + *mig.dst = migrate_pfn(pfn); else { unlock_page(dpage); __free_page(dpage); @@ -774,7 +774,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma, } } - *mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; + *mig.dst = migrate_pfn(page_to_pfn(dpage)); migrate_vma_pages(); out_finalize: migrate_vma_finalize(); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 4a16e3c257b9..41d9417f182b 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -300,7 +300,6 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct svm_range *prange, migrate->dst[i] = svm_migrate_addr_to_pfn(adev, dst[i]); svm_migrate_get_vram_page(prange, migrate->dst[i]); migrate->dst[i] = migrate_pfn(migrate->dst[i]); - migrate->dst[i] |= MIGRATE_PFN_LOCKED; src[i] = dma_map_page(dev, spage, 0, PAGE_SIZE, DMA_TO_DEVICE); r = dma_mapping_error(dev, src[i]); @@ -580,7 +579,6 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct svm_range *prange, dst[i] >> PAGE_SHIFT, page_to_pfn(dpage)); migrate->dst[i] = migrate_pfn(page_to_pfn(dpage)); - migrate->dst[i] |= MIGRATE_PFN_LOCKED; j++; } diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c index 92987daa5e17..3828aafd3ac4 100644 --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c @@ -166,7 +166,7 @@ static vm_fault_t nouveau_dmem_fault_copy_one(struct nouveau_drm *drm, goto error_dma_unmap; mutex_unlock(>mutex); - args->dst[0] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; + args->dst[0] = migrate_pfn(page_to_pfn(dpage)); return 0; error_dma_unmap: @@ -602,7 +602,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm, ((paddr >> PAGE_SHIFT) << NVIF_VMM_PFNMAP_V0_ADDR_SHIFT); if (src & MIGRATE_PFN_WRITE) *pfn |= NVIF_VMM_PFNMAP_V0_W; - return migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED; + return migrate_pfn(page_to_pfn(dpage)); out_dma_unmap: dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL); diff --git a/include/linux/migrate.h b/include/linux/migrate.h index c8077e936691..479b861ae490 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -119,7 +119,6 @@ static inline int migrate_misplaced_page(struct page *page, */ #define MIGRATE_PFN_VALID (1UL << 0) #define MIGRATE_PFN_MIGRATE(1UL << 1) -#define MIGRATE_PFN_LOCKED (1UL << 2)
Re: [Nouveau] [Intel-gfx] [PATCH v3 13/13] drm/i915: replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi
On Fri, Oct 22, 2021 at 03:22:57PM +0300, Ville Syrjälä wrote: > On Fri, Oct 22, 2021 at 03:01:52PM +0300, Ville Syrjälä wrote: > > On Fri, Oct 22, 2021 at 12:25:33PM +0200, Claudio Suarez wrote: > > > On Thu, Oct 21, 2021 at 04:49:59PM +0300, Ville Syrjälä wrote: > > > > On Wed, Oct 20, 2021 at 12:51:21AM +0200, Claudio Suarez wrote: > > > > > drm_get_edid() internally calls to > > > > > drm_connector_update_edid_property() > > > > > and then drm_add_display_info(), which parses the EDID. > > > > > This happens in the function intel_hdmi_set_edid() and > > > > > intel_sdvo_tmds_sink_detect() (via intel_sdvo_get_edid()). > > > > > > > > > > Once EDID is parsed, the monitor HDMI support information is available > > > > > through drm_display_info.is_hdmi. Retriving the same information with > > > > > drm_detect_hdmi_monitor() is less efficient. Change to > > > > > drm_display_info.is_hdmi > > > > > > > > I meant we need to examine all call chains that can lead to > > > > .detect() to make sure all of them do in fact update the > > > > display_info beforehand. > > > > > > Well, I studied it carefully and, yes, all call chains that can lead to > > > drm_display_info.is_hdmi / drm_detect_hdmi_monitor() update display_info > > > beforehand. In the case that this doesn't happen, the code is unchanged. > > > > > > Do you want I explain the changes in the code here again ? Or do you want > > > to me change the commit message to be more clear ? In the first case, I > > > can > > > write here a detailed explanation. In the second case I can make a longer > > > commit > > > message. > > > > > > Or both? > > > > I want all those call chains explained in the commit message, > > otherwise I have no easy way to confirm whether the change > > is correct or not. > > Hmm. OK, so I had a bit of a dig around and seems that what we do now > .detect()->drm_get_edid()->drm_connector_update_edid_property()->drm_add_display_info() Yes. I said before that I felt something was wrong when I read the documentation and then the code. To be more explicit now, I expected that drm_connector_update_edid_property() will be done in the fill_modes/get_modes phase instead of when reading the edid. The documentation suggests that but the code reads the edid in the detect phase. Now, since drm_connector_update_edid_property() is called in the detect phase, it is not necessary to keep the edid data in the private connector struct. It is in struct drm_connector from the beginning. But this is topic for another patch. > Now the question is when did that start happening? Looks like it was > commit 4b4df570b41d ("drm: Update edid-derived drm_display_info fields > at edid property set [v2]") that started to call drm_add_display_info() > from drm_connector_update_edid_property(), and then commit 5186421cbfe2 > ("drm: Introduce epoch counter to drm_connector") started to call > drm_connector_update_edid_property() from drm_get_edid(). Before both > of those commits were in place display_info would still contain > some stale garbage during .detect(). > > That is the story I think we want in these commit messages since it > a) explains why the old code was directly parsing the edid instead > b) why it's now safe to change this --commit-message? drm/i915: replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi Commit a92d083d08b0 created the new flag is_hdmi in drm_display_info which is set when sink compliant with CEA-861 (EDID) shall be treated as an HDMI sink. >From that day, this value can be used in some cases instead of calling drm_detect_hdmi_monitor() and a second parse is avoided because drm_detect_hdmi_monitor() parses. A TODO task was registered in Documentation/gpu/todo.rst to perform that task in the future. The flag drm_display_info.is_hdmi is set in the function drm_add_display_info(), which is called from drm_connector_update_edid_property(). Since commit 5186421cbfe2, drm_get_edid() calls drm_connector_update_edid_property() when reading the edid data from an i2c adapter. Therefore, in these cases drm_display_info.is_hdmi is updated to its correct value when returning from drm_get_edid(). Replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi in the cases when drm_detect_hdmi_monitor() is called after a read from an i2c adapter using drm_get_edid() in the i915 driver. --- > > PS. connector->force handling in drm_get_edid() looks a bit busted > since it doesn't call drm_connector_update_edid_property() at all > in some cases. I think there might be some path that leads there > anywya if/when we change connector->force, but we should fix > drm_get_edid() to do the right thing regarless. In those cases, the edid isn't read and NULL is returned by drm_get_edid(). No problem because display_info.is_hdmi is inside an if (edid != NULL). BTW, struct intel_connector is allocated with kzalloc, so the initial value of is_hdmi is