Re: [RFT PATCH v2 19/48] drm/panel: novatek-nt36672a: Don't call unprepare+disable at shutdown/remove
Sorry for the long lines. Had to use a different email client and ended messing it up. Regards Joel Selvaraj From: Joel Selvaraj Sent: 05 May 2024 05:28 To: Douglas Anderson; dri-devel@lists.freedesktop.org; Maxime Ripard Cc: Linus Walleij; Chris Morgan; Yuran Pereira; Neil Armstrong; Sumit Semwal; Benni Steini; Marijn Suijten; Daniel Vetter; David Airlie; Jessica Zhang; Maarten Lankhorst; Sam Ravnborg; Thomas Zimmermann; linux-ker...@vger.kernel.org Subject: Re: [RFT PATCH v2 19/48] drm/panel: novatek-nt36672a: Don't call unprepare+disable at shutdown/remove Hi Douglas Anderson, Poco F1 is one of the main user for this panel. I tested the patch in my Poco F1 running postmarketOS. It works fine. Thank you. The panel itself requires other extra fixes to work properly which I intend to upstream in the upcoming weeks. But your patch doesn't break anything and works as expected. So. Tested-by: Joel Selvaraj Regards Joel Selvaraj From: Douglas Anderson Sent: 04 May 2024 03:03 To: dri-devel@lists.freedesktop.org; Maxime Ripard Cc: Linus Walleij; Chris Morgan; Yuran Pereira; Neil Armstrong; Douglas Anderson; Sumit Semwal; Benni Steini; Joel Selvaraj; Marijn Suijten; Daniel Vetter; David Airlie; Jessica Zhang; Maarten Lankhorst; Sam Ravnborg; Thomas Zimmermann; linux-ker...@vger.kernel.org Subject: [RFT PATCH v2 19/48] drm/panel: novatek-nt36672a: Don't call unprepare+disable at shutdown/remove It's the responsibility of a correctly written DRM modeset driver to call drm_atomic_helper_shutdown() at shutdown time and that should be disabling / unpreparing the panel if needed. Panel drivers shouldn't be calling these functions themselves. A recent effort was made to fix as many DRM modeset drivers as possible [1] [2] [3] and most drivers are fixed now. A grep through mainline for compatible strings used by this driver indicates that it is used by Qualcomm boards. The Qualcomm driver appears to be correctly calling drm_atomic_helper_shutdown() so we can remove the calls. [1] https://lore.kernel.org/r/20230901234015.566018-1-diand...@chromium.org [2] https://lore.kernel.org/r/20230901234202.566951-1-diand...@chromium.org [3] https://lore.kernel.org/r/20230921192749.1542462-1-diand...@chromium.org Cc: Sumit Semwal Cc: Benni Steini Cc: Joel Selvaraj Cc: Marijn Suijten Signed-off-by: Douglas Anderson --- Changes in v2: - Only handle 1 panel per patch. - Split removal of prepared/enabled from handling of remove/shutdown. drivers/gpu/drm/panel/panel-novatek-nt36672a.c | 17 - 1 file changed, 17 deletions(-) diff --git a/drivers/gpu/drm/panel/panel-novatek-nt36672a.c b/drivers/gpu/drm/panel/panel-novatek-nt36672a.c index 35aace79613a..c2abd20e0734 100644 --- a/drivers/gpu/drm/panel/panel-novatek-nt36672a.c +++ b/drivers/gpu/drm/panel/panel-novatek-nt36672a.c @@ -656,14 +656,6 @@ static void nt36672a_panel_remove(struct mipi_dsi_device *dsi) struct nt36672a_panel *pinfo = mipi_dsi_get_drvdata(dsi); int err; - err = drm_panel_unprepare(&pinfo->base); - if (err < 0) - dev_err(&dsi->dev, "failed to unprepare panel: %d\n", err); - - err = drm_panel_disable(&pinfo->base); - if (err < 0) - dev_err(&dsi->dev, "failed to disable panel: %d\n", err); - err = mipi_dsi_detach(dsi); if (err < 0) dev_err(&dsi->dev, "failed to detach from DSI host: %d\n", err); @@ -671,14 +663,6 @@ static void nt36672a_panel_remove(struct mipi_dsi_device *dsi) drm_panel_remove(&pinfo->base); } -static void nt36672a_panel_shutdown(struct mipi_dsi_device *dsi) -{ - struct nt36672a_panel *pinfo = mipi_dsi_get_drvdata(dsi); - - drm_panel_disable(&pinfo->base); - drm_panel_unprepare(&pinfo->base); -} - static const struct of_device_id tianma_fhd_video_of_match[] = { { .compatible = "tianma,fhd-video", .data = &tianma_fhd_video_panel_desc }, { }, @@ -692,7 +676,6 @@ static struct mipi_dsi_driver nt36672a_panel_driver = { }, .probe = nt36672a_panel_probe, .remove = nt36672a_panel_remove, - .shutdown = nt36672a_panel_shutdown, }; module_mipi_dsi_driver(nt36672a_panel_driver); -- 2.45.0.rc1.225.g2a3ae87e7f-goog
Re: [RFT PATCH v2 18/48] drm/panel: novatek-nt36672a: Stop tracking prepared
Hi Douglas Anderson, Poco F1 is one of the main user for this panel. I tested the patch in my Poco F1 running postmarketOS. It works fine. Thank you. The panel itself requires other extra fixes to work properly which I intend to upstream in the upcoming weeks. But your patch doesn't break anything and works as expected. So. Tested-by: Joel Selvaraj Regards Joel Selvaraj From: Douglas Anderson Sent: 04 May 2024 03:02 To: dri-devel@lists.freedesktop.org; Maxime Ripard Cc: Linus Walleij; Chris Morgan; Yuran Pereira; Neil Armstrong; Douglas Anderson; Sumit Semwal; Benni Steini; Joel Selvaraj; Marijn Suijten; Daniel Vetter; David Airlie; Jessica Zhang; Maarten Lankhorst; Sam Ravnborg; Thomas Zimmermann; linux-ker...@vger.kernel.org Subject: [RFT PATCH v2 18/48] drm/panel: novatek-nt36672a: Stop tracking prepared As talked about in commit d2aacaf07395 ("drm/panel: Check for already prepared/enabled in drm_panel"), we want to remove needless code from panel drivers that was storing and double-checking the prepared/enabled state. Even if someone was relying on the double-check before, that double-check is now in the core and not needed in individual drivers. Cc: Sumit Semwal Cc: Benni Steini Cc: Joel Selvaraj Cc: Marijn Suijten Signed-off-by: Douglas Anderson --- Changes in v2: - Only handle 1 panel per patch. - Split removal of prepared/enabled from handling of remove/shutdown. drivers/gpu/drm/panel/panel-novatek-nt36672a.c | 12 1 file changed, 12 deletions(-) diff --git a/drivers/gpu/drm/panel/panel-novatek-nt36672a.c b/drivers/gpu/drm/panel/panel-novatek-nt36672a.c index 3886372415c2..35aace79613a 100644 --- a/drivers/gpu/drm/panel/panel-novatek-nt36672a.c +++ b/drivers/gpu/drm/panel/panel-novatek-nt36672a.c @@ -72,8 +72,6 @@ struct nt36672a_panel { struct regulator_bulk_data supplies[ARRAY_SIZE(nt36672a_regulator_names)]; struct gpio_desc *reset_gpio; - - bool prepared; }; static inline struct nt36672a_panel *to_nt36672a_panel(struct drm_panel *panel) @@ -119,9 +117,6 @@ static int nt36672a_panel_unprepare(struct drm_panel *panel) struct nt36672a_panel *pinfo = to_nt36672a_panel(panel); int ret; - if (!pinfo->prepared) - return 0; - /* send off cmds */ ret = nt36672a_send_cmds(panel, pinfo->desc->off_cmds, pinfo->desc->num_off_cmds); @@ -147,8 +142,6 @@ static int nt36672a_panel_unprepare(struct drm_panel *panel) if (ret < 0) dev_err(panel->dev, "power_off failed ret = %d\n", ret); - pinfo->prepared = false; - return ret; } @@ -179,9 +172,6 @@ static int nt36672a_panel_prepare(struct drm_panel *panel) struct nt36672a_panel *pinfo = to_nt36672a_panel(panel); int err; - if (pinfo->prepared) - return 0; - err = nt36672a_panel_power_on(pinfo); if (err < 0) goto poweroff; @@ -221,8 +211,6 @@ static int nt36672a_panel_prepare(struct drm_panel *panel) msleep(120); - pinfo->prepared = true; - return 0; poweroff: -- 2.45.0.rc1.225.g2a3ae87e7f-goog
Re: [RFT PATCH v2 19/48] drm/panel: novatek-nt36672a: Don't call unprepare+disable at shutdown/remove
Hi Douglas Anderson, Poco F1 is one of the main user for this panel. I tested the patch in my Poco F1 running postmarketOS. It works fine. Thank you. The panel itself requires other extra fixes to work properly which I intend to upstream in the upcoming weeks. But your patch doesn't break anything and works as expected. So. Tested-by: Joel Selvaraj Regards Joel Selvaraj From: Douglas Anderson Sent: 04 May 2024 03:03 To: dri-devel@lists.freedesktop.org; Maxime Ripard Cc: Linus Walleij; Chris Morgan; Yuran Pereira; Neil Armstrong; Douglas Anderson; Sumit Semwal; Benni Steini; Joel Selvaraj; Marijn Suijten; Daniel Vetter; David Airlie; Jessica Zhang; Maarten Lankhorst; Sam Ravnborg; Thomas Zimmermann; linux-ker...@vger.kernel.org Subject: [RFT PATCH v2 19/48] drm/panel: novatek-nt36672a: Don't call unprepare+disable at shutdown/remove It's the responsibility of a correctly written DRM modeset driver to call drm_atomic_helper_shutdown() at shutdown time and that should be disabling / unpreparing the panel if needed. Panel drivers shouldn't be calling these functions themselves. A recent effort was made to fix as many DRM modeset drivers as possible [1] [2] [3] and most drivers are fixed now. A grep through mainline for compatible strings used by this driver indicates that it is used by Qualcomm boards. The Qualcomm driver appears to be correctly calling drm_atomic_helper_shutdown() so we can remove the calls. [1] https://lore.kernel.org/r/20230901234015.566018-1-diand...@chromium.org [2] https://lore.kernel.org/r/20230901234202.566951-1-diand...@chromium.org [3] https://lore.kernel.org/r/20230921192749.1542462-1-diand...@chromium.org Cc: Sumit Semwal Cc: Benni Steini Cc: Joel Selvaraj Cc: Marijn Suijten Signed-off-by: Douglas Anderson --- Changes in v2: - Only handle 1 panel per patch. - Split removal of prepared/enabled from handling of remove/shutdown. drivers/gpu/drm/panel/panel-novatek-nt36672a.c | 17 - 1 file changed, 17 deletions(-) diff --git a/drivers/gpu/drm/panel/panel-novatek-nt36672a.c b/drivers/gpu/drm/panel/panel-novatek-nt36672a.c index 35aace79613a..c2abd20e0734 100644 --- a/drivers/gpu/drm/panel/panel-novatek-nt36672a.c +++ b/drivers/gpu/drm/panel/panel-novatek-nt36672a.c @@ -656,14 +656,6 @@ static void nt36672a_panel_remove(struct mipi_dsi_device *dsi) struct nt36672a_panel *pinfo = mipi_dsi_get_drvdata(dsi); int err; - err = drm_panel_unprepare(&pinfo->base); - if (err < 0) - dev_err(&dsi->dev, "failed to unprepare panel: %d\n", err); - - err = drm_panel_disable(&pinfo->base); - if (err < 0) - dev_err(&dsi->dev, "failed to disable panel: %d\n", err); - err = mipi_dsi_detach(dsi); if (err < 0) dev_err(&dsi->dev, "failed to detach from DSI host: %d\n", err); @@ -671,14 +663,6 @@ static void nt36672a_panel_remove(struct mipi_dsi_device *dsi) drm_panel_remove(&pinfo->base); } -static void nt36672a_panel_shutdown(struct mipi_dsi_device *dsi) -{ - struct nt36672a_panel *pinfo = mipi_dsi_get_drvdata(dsi); - - drm_panel_disable(&pinfo->base); - drm_panel_unprepare(&pinfo->base); -} - static const struct of_device_id tianma_fhd_video_of_match[] = { { .compatible = "tianma,fhd-video", .data = &tianma_fhd_video_panel_desc }, { }, @@ -692,7 +676,6 @@ static struct mipi_dsi_driver nt36672a_panel_driver = { }, .probe = nt36672a_panel_probe, .remove = nt36672a_panel_remove, - .shutdown = nt36672a_panel_shutdown, }; module_mipi_dsi_driver(nt36672a_panel_driver); -- 2.45.0.rc1.225.g2a3ae87e7f-goog
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Sat, 4 May 2024 at 08:32, Linus Torvalds wrote: > > Lookie here, the fundamental issue is that epoll can call '->poll()' > on a file descriptor that is being closed concurrently. Thinking some more about this, and replying to myself... Actually, I wonder if we could *really* fix this by simply moving the eventpoll_release() to where it really belongs. If we did it in file_close_fd_locked(), it would actually make a *lot* more sense. Particularly since eventpoll actually uses this: struct epoll_filefd { struct file *file; int fd; } __packed; ie it doesn't just use the 'struct file *', it uses the 'fd' itself (for ep_find()). (Strictly speaking, it should also have a pointer to the 'struct files_struct' to make the 'int fd' be meaningful). IOW, eventpoll already considers the file _descriptor_ relevant, not just the file pointer, and that's destroyed at *close* time, not at 'fput()' time. Yeah, yeah, the locking situation in file_close_fd_locked() is a bit inconvenient, but if we can solve that, it would solve the problem in a fundamentally different way: remove the ep iterm before the file->f_count has actually been decremented, so the whole "race with fput()" would just go away entirely. I dunno. I think that would be the right thing to do, but I wouldn't be surprised if some disgusting eventpoll user then might depend on the current situation where the eventpoll thing stays around even after the close() if you have another copy of the file open. Linus
Re: [PATCH] drm/amd/pm: Fix error code in vega10_hwmgr_backend_init()
> Return -EINVAL on error instead of success. Also on the success path, > return a literal zero instead of "return result;" How do you think about to omit the initialisation for the variable “result” in another update step? Regards, Markus
[PATCH] drm/msm/a6xx: Cleanup indexed regs const'ness
From: Rob Clark These tables were made non-const in commit 3cba4a2cdff3 ("drm/msm/a6xx: Update ROQ size in coredump") in order to avoid powering up the GPU when reading back a devcoredump. Instead let's just stash the count that is potentially read from hw in struct a6xx_gpu_state_obj, and make the tables const again. Signed-off-by: Rob Clark --- drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c | 15 +-- drivers/gpu/drm/msm/adreno/a6xx_gpu_state.h | 8 2 files changed, 13 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c index 77146d30bcaa..0a7717a4fc2f 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c @@ -24,6 +24,7 @@ struct a6xx_gpu_state_obj { const void *handle; u32 *data; + u32 count; /* optional, used when count potentially read from hw */ }; struct a6xx_gpu_state { @@ -1437,16 +1438,18 @@ static u32 a7xx_get_cp_roq_size(struct msm_gpu *gpu) /* Read a block of data from an indexed register pair */ static void a6xx_get_indexed_regs(struct msm_gpu *gpu, struct a6xx_gpu_state *a6xx_state, - struct a6xx_indexed_registers *indexed, + const struct a6xx_indexed_registers *indexed, struct a6xx_gpu_state_obj *obj) { + u32 count = indexed->count; int i; obj->handle = (const void *) indexed; if (indexed->count_fn) - indexed->count = indexed->count_fn(gpu); + count = indexed->count_fn(gpu); - obj->data = state_kcalloc(a6xx_state, indexed->count, sizeof(u32)); + obj->data = state_kcalloc(a6xx_state, count, sizeof(u32)); + obj->count = count; if (!obj->data) return; @@ -1454,7 +1457,7 @@ static void a6xx_get_indexed_regs(struct msm_gpu *gpu, gpu_write(gpu, indexed->addr, 0); /* Read the data - each read increments the internal address by 1 */ - for (i = 0; i < indexed->count; i++) + for (i = 0; i < count; i++) obj->data[i] = gpu_read(gpu, indexed->data); } @@ -1890,9 +1893,9 @@ static void a6xx_show_indexed_regs(struct a6xx_gpu_state_obj *obj, return; print_name(p, " - regs-name: ", indexed->name); - drm_printf(p, "dwords: %d\n", indexed->count); + drm_printf(p, "dwords: %d\n", obj->count); - print_ascii85(p, indexed->count << 2, obj->data); + print_ascii85(p, obj->count << 2, obj->data); } static void a6xx_show_debugbus_block(const struct a6xx_debugbus_block *block, diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.h index 3b1ba514e8ee..dd4c28a8d923 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.h +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.h @@ -397,7 +397,7 @@ struct a6xx_indexed_registers { u32 (*count_fn)(struct msm_gpu *gpu); }; -static struct a6xx_indexed_registers a6xx_indexed_reglist[] = { +static const struct a6xx_indexed_registers a6xx_indexed_reglist[] = { { "CP_SQE_STAT", REG_A6XX_CP_SQE_STAT_ADDR, REG_A6XX_CP_SQE_STAT_DATA, 0x33, NULL }, { "CP_DRAW_STATE", REG_A6XX_CP_DRAW_STATE_ADDR, @@ -408,7 +408,7 @@ static struct a6xx_indexed_registers a6xx_indexed_reglist[] = { REG_A6XX_CP_ROQ_DBG_DATA, 0, a6xx_get_cp_roq_size}, }; -static struct a6xx_indexed_registers a7xx_indexed_reglist[] = { +static const struct a6xx_indexed_registers a7xx_indexed_reglist[] = { { "CP_SQE_STAT", REG_A6XX_CP_SQE_STAT_ADDR, REG_A6XX_CP_SQE_STAT_DATA, 0x33, NULL }, { "CP_DRAW_STATE", REG_A6XX_CP_DRAW_STATE_ADDR, @@ -433,12 +433,12 @@ static struct a6xx_indexed_registers a7xx_indexed_reglist[] = { REG_A6XX_CP_ROQ_DBG_DATA, 0, a7xx_get_cp_roq_size }, }; -static struct a6xx_indexed_registers a6xx_cp_mempool_indexed = { +static const struct a6xx_indexed_registers a6xx_cp_mempool_indexed = { "CP_MEMPOOL", REG_A6XX_CP_MEM_POOL_DBG_ADDR, REG_A6XX_CP_MEM_POOL_DBG_DATA, 0x2060, NULL, }; -static struct a6xx_indexed_registers a7xx_cp_bv_mempool_indexed[] = { +static const struct a6xx_indexed_registers a7xx_cp_bv_mempool_indexed[] = { { "CP_MEMPOOL", REG_A6XX_CP_MEM_POOL_DBG_ADDR, REG_A6XX_CP_MEM_POOL_DBG_DATA, 0x2100, NULL }, { "CP_BV_MEMPOOL", REG_A7XX_CP_BV_MEM_POOL_DBG_ADDR, -- 2.44.0
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Sat, 4 May 2024 at 08:40, Linus Torvalds wrote: > > And maybe it's even *only* dma-buf that does that fget() in its > ->poll() function. Even *then* it's not a dma-buf.c bug. They all do in the sense that they do poll_wait -> __pollwait -> get_file (*boom*) but the boom is very small because the poll_wait() will be undone by poll_freewait(), and normally poll/select has held the file count elevated. Except for epoll. Which leaves those pollwait entries around until it's done - but again will be held up on the ep->mtx before it does so. So everybody does some f_count games, but possibly dma-buf is the only one that ends up expecting to hold on to the f_count for longer periods. Linus
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Sat, 4 May 2024 at 08:32, Linus Torvalds wrote: > > Now, during this TOTALLY INNOCENT sock_poll(), in another thread, the > file closing completes, eventpoll_release() finishes [..] Actually, Al is right that ep_item_poll() should be holding the ep->mtx, so eventpoll_release() -> eventpoll_release_file_file() -> mutex_lock(&ep->mtx) should block and the file doesn't actually get released. So I guess the sock_poll() issue cannot happen. It does need some poll() function that does 'fget()', and believes that it works. But because the f_count has already gone down to zero, fget() doesn't work, and doesn't keep the file around, and you have the bug. The cases that do fget() in poll() are probably race, but they aren't buggy. epoll is buggy. So my example wasn't going to work, but the argument isn't really any different, it's just a much more limited case that breaks. And maybe it's even *only* dma-buf that does that fget() in its ->poll() function. Even *then* it's not a dma-buf.c bug. Linus
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Sat, 4 May 2024 at 02:37, Christian Brauner wrote: > > --- a/drivers/dma-buf/dma-buf.c > +++ b/drivers/dma-buf/dma-buf.c > @@ -244,13 +244,18 @@ static __poll_t dma_buf_poll(struct file *file, > poll_table *poll) > if (!dmabuf || !dmabuf->resv) > return EPOLLERR; > > + if (!get_file_active(&dmabuf->file)) > + return EPOLLERR; [...] I *really* don't think anything that touches dma-buf.c can possibly be right. This is not a dma-buf.c bug. This is purely an epoll bug. Lookie here, the fundamental issue is that epoll can call '->poll()' on a file descriptor that is being closed concurrently. That means that *ANY* driver that relies on *any* data structure that is managed by the lifetime of the 'struct file' will have problems. Look, here's sock_poll(): static __poll_t sock_poll(struct file *file, poll_table *wait) { struct socket *sock = file->private_data; and that first line looks about as innocent as it possibly can, right? Now, imagine that this is called from 'epoll' concurrently with the file being closed for the last time (but it just hasn't _quite_ reached eventpoll_release() yet). Now, imagine that the kernel is built with preemption, and the epoll thread gets preempted _just_ before it loads 'file->private_data'. Furthermore, the machine is under heavy load, and it just stays off its CPU a long time. Now, during this TOTALLY INNOCENT sock_poll(), in another thread, the file closing completes, eventpoll_release() finishes, and the preemption of the poll() thing just takes so long that you go through an RCU period too, so that the actual file has been released too. So now that totally innoced file->private_data load in the poll() is probably going to get random data. Yes, the file is allocated as SLAB_TYPESAFE_BY_RCU, so it's probably still a file. Not guaranteed, even the slab will get fully free'd in some situations. And yes, the above case is impossible to hit in practice. You have to hit quite the small race window with an operation that practically never happens in the first place. But my point is that the fact that the problem with file->f_count lifetimes happens for that dmabuf driver is not the fault of the dmabuf code. Not at all. It is *ENTIRELY* a bug in epoll, and the dmabuf code is probably just easier to hit because it has a poll() function that does things that have longer lifetimes than most things, and interacts more directly with that f_count. So I really don't understand why Al thinks this is "dmabuf does bad things with f_count". It damn well does not. dma-buf is the GOOD GUY here. It's doing things *PROPERLY*. It's taking refcounts like it damn well should. The fact that it takes ref-counts on something that the epoll code has messed up is *NOT* its fault. Linus
Re: [PATCH] fbdev: Have CONFIG_FB_NOTIFY be tristate
Hi Florian, kernel test robot noticed the following build errors: [auto build test ERROR on drm-misc/drm-misc-next] [also build test ERROR on drm-tip/drm-tip linus/master v6.9-rc6 next-20240503] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Florian-Fainelli/fbdev-Have-CONFIG_FB_NOTIFY-be-tristate/20240504-033139 base: git://anongit.freedesktop.org/drm/drm-misc drm-misc-next patch link: https://lore.kernel.org/r/20240503192858.103640-1-florian.fainelli%40broadcom.com patch subject: [PATCH] fbdev: Have CONFIG_FB_NOTIFY be tristate config: i386-randconfig-015-20240504 (https://download.01.org/0day-ci/archive/20240504/202405042242.ixldu4xj-...@intel.com/config) compiler: gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240504/202405042242.ixldu4xj-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202405042242.ixldu4xj-...@intel.com/ All errors (new ones prefixed by >>): ld: drivers/leds/trigger/ledtrig-backlight.o: in function `bl_trig_deactivate': >> drivers/leds/trigger/ledtrig-backlight.c:128:(.text+0x136): undefined >> reference to `fb_unregister_client' ld: drivers/leds/trigger/ledtrig-backlight.o: in function `bl_trig_activate': >> drivers/leds/trigger/ledtrig-backlight.c:117:(.text+0x1aa): undefined >> reference to `fb_register_client' vim +128 drivers/leds/trigger/ledtrig-backlight.c e4786ba0db7b11 drivers/leds/trigger/ledtrig-backlight.c Uwe Kleine-König 2018-07-02 100 2282e125a406e0 drivers/leds/trigger/ledtrig-backlight.c Uwe Kleine-König 2018-07-02 101 static int bl_trig_activate(struct led_classdev *led) 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 102 { 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 103 int ret; 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 104 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 105 struct bl_trig_notifier *n; 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 106 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 107 n = kzalloc(sizeof(struct bl_trig_notifier), GFP_KERNEL); e4786ba0db7b11 drivers/leds/trigger/ledtrig-backlight.c Uwe Kleine-König 2018-07-02 108 if (!n) e4786ba0db7b11 drivers/leds/trigger/ledtrig-backlight.c Uwe Kleine-König 2018-07-02 109 return -ENOMEM; e4786ba0db7b11 drivers/leds/trigger/ledtrig-backlight.c Uwe Kleine-König 2018-07-02 110 led_set_trigger_data(led, n); 9f9455ae710786 drivers/leds/ledtrig-backlight.c Janusz Krzysztofik 2011-01-12 111 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 112 n->led = led; 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 113 n->brightness = led->brightness; 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 114 n->old_status = UNBLANK; 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 115 n->notifier.notifier_call = fb_notifier_callback; 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 116 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 @117 ret = fb_register_client(&n->notifier); 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 118 if (ret) 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 119 dev_err(led->dev, "unable to register backlight trigger\n"); 2282e125a406e0 drivers/leds/trigger/ledtrig-backlight.c Uwe Kleine-König 2018-07-02 120 2282e125a406e0 drivers/leds/trigger/ledtrig-backlight.c Uwe Kleine-König 2018-07-02 121 return 0; 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 122 } 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 123 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 124 static void bl_trig_deactivate(struct led_classdev *led) 132e9306beedd0 drivers/leds/ledtrig-backlight.c Rodolfo Giometti 2008-10-13 125 { e4786ba0db7b11 drivers/leds/trigger/ledtrig-backlight.c Uwe Kleine-König 2018-07-0
Re: [PATCH v7 16/18] drm/mediatek: Support CRC in display driver
Hi Shawn, kernel test robot noticed the following build errors: [auto build test ERROR on next-20240501] [cannot apply to v6.9-rc6 v6.9-rc5 v6.9-rc4 linus/master v6.9-rc6] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Shawn-Sung/soc-mediatek-Disable-9-bit-alpha-in-ETHDR/20240502-184103 base: next-20240501 patch link: https://lore.kernel.org/r/20240502103848.5845-17-shawn.sung%40mediatek.com patch subject: [PATCH v7 16/18] drm/mediatek: Support CRC in display driver config: arm64-defconfig (https://download.01.org/0day-ci/archive/20240504/202405042214.vhrmelku-...@intel.com/config) compiler: aarch64-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240504/202405042214.vhrmelku-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202405042214.vhrmelku-...@intel.com/ All errors (new ones prefixed by >>): drivers/gpu/drm/mediatek/mtk_crtc.c: In function 'mtk_crtc_create_crc_cmdq': >> drivers/gpu/drm/mediatek/mtk_crtc.c:1362:9: error: too few arguments to >> function 'cmdq_pkt_jump' 1362 | cmdq_pkt_jump(&crc->cmdq_handle, crc->cmdq_handle.pa_base); | ^ In file included from drivers/gpu/drm/mediatek/mtk_crtc.c:11: include/linux/soc/mediatek/mtk-cmdq.h:319:19: note: declared here 319 | static inline int cmdq_pkt_jump(struct cmdq_pkt *pkt, dma_addr_t addr, u8 shift_pa) | ^ vim +/cmdq_pkt_jump +1362 drivers/gpu/drm/mediatek/mtk_crtc.c 1267 1268 #if IS_REACHABLE(CONFIG_MTK_CMDQ) 1269 /** 1270 * mtk_crtc_create_crc_cmdq - Create a CMDQ thread for syncing the CRCs 1271 * @dev: Kernel device node of the CRC provider 1272 * @crc: Pointer of the CRC to init 1273 * 1274 * This function will create a looping thread on GCE (Global Command Engine) to 1275 * keep the CRC up to date by monitoring the assigned event (usually the frame 1276 * done event) of the CRC provider, and read the CRCs from the registers to a 1277 * shared memory for the workqueue to read. To start/stop the looping thread, 1278 * please call `mtk_crtc_start_crc_cmdq()` and `mtk_crtc_stop_crc_cmdq()` 1279 * defined blow. 1280 * 1281 * The reason why we don't update the CRCs with CPU is that the front porch of 1282 * 4K60 timing in CEA-861 is less than 60us, and register read/write speed is 1283 * relatively unreliable comparing to GCE due to the bus design. 1284 * 1285 * We must create a new thread instead of using the original one for plane 1286 * update is because: 1287 * 1. We cannot add another wait-for-event command at the end of cmdq packet, or 1288 *the cmdq callback will delay for too long 1289 * 2. Will get the CRC of the previous frame if using the existed wait-for-event 1290 *command which is at the beginning of the packet 1291 */ 1292 void mtk_crtc_create_crc_cmdq(struct device *dev, struct mtk_crtc_crc *crc) 1293 { 1294 int i; 1295 1296 if (!crc->cnt) { 1297 dev_warn(dev, "%s: not support\n", __func__); 1298 goto cleanup; 1299 } 1300 1301 if (!crc->ofs) { 1302 dev_warn(dev, "%s: not defined\n", __func__); 1303 goto cleanup; 1304 } 1305 1306 crc->cmdq_client.client.dev = dev; 1307 crc->cmdq_client.client.tx_block = false; 1308 crc->cmdq_client.client.knows_txdone = true; 1309 crc->cmdq_client.client.rx_callback = NULL; 1310 crc->cmdq_client.chan = mbox_request_channel(&crc->cmdq_client.client, 0); 1311 if (IS_ERR(crc->cmdq_client.chan)) { 1312 dev_warn(dev, "%s: failed to create mailbox client\n", __func__); 1313 crc->cmdq_client.chan = NULL; 1314 goto cleanup; 1315 } 1316 1317 if (mtk_drm_cmdq_pkt_create(&crc->cmdq_client, &crc->cmdq_handle, PAGE_SIZE)) { 1318 dev_warn(dev, "%s: failed to create cmdq packet\n", __func__); 1319 goto cleanup; 1320 } 1321 1322 if (!crc->va) { 1323 dev_warn(dev, "%s: no memory\n", __func__); 1324 goto cleanup; 1325 } 1326 1327 /* map the entry to get a dma ad
Re: [BUG] drm: zynqmp_dp: Lockup in zynqmp_dp_bridge_detect when device is unbound
Hi Sean, On Fri, May 03, 2024 at 05:54:32PM -0400, Sean Anderson wrote: > Hi, > > I have discovered a bug in the displayport driver on drm-misc-next. To > trigger it, run > > echo fd4a.display > /sys/bus/platform/drivers/zynqmp-dpsub/unbind > > The system will become unresponsive and (after a bit) splat with a hard > LOCKUP. One core will be unresponsive at the first zynqmp_dp_read in > zynqmp_dp_bridge_detect. > > I believe the issue is due the registers being unmapped and the block > put into reset in zynqmp_dp_remove instead of zynqmp_dpsub_release. That is on purpose. Drivers are not allowed to access the device at all after .remove() returns. > This > could be resolved by deferring things until zynqmp_dpsub_release > (requiring us to skip devm_*), or by adding a flag to struct zynqmp_dp > and checking it before each callback. A subsystem-level implementation > might be better for the latter. > > For a better traceback, try applying the below patch and running the > following commands before triggering the lockup: > > echo 4 > /sys/module/drm/parameters/debug > echo 8 > /proc/sys/kernel/printk > > diff --git a/drivers/gpu/drm/xlnx/zynqmp_dp.c > b/drivers/gpu/drm/xlnx/zynqmp_dp.c > index 9df068a413f3..17b477b14ab5 100644 > --- a/drivers/gpu/drm/xlnx/zynqmp_dp.c > +++ b/drivers/gpu/drm/xlnx/zynqmp_dp.c > @@ -296,6 +296,7 @@ struct zynqmp_dp_config { > * @train_set: set of training data > */ > struct zynqmp_dp { > + unsigned long long magic; > struct device *dev; > struct zynqmp_dpsub *dpsub; > void __iomem *iomem; > @@ -1533,6 +1534,8 @@ static enum drm_connector_status > zynqmp_dp_bridge_detect(struct drm_bridge *brid > u32 state, i; > int ret; > > + WARN_ON(dp->magic != 0x0123456789abcdefULL); > + > /* > * This is from heuristic. It takes some delay (ex, 100 ~ 500 msec) to > * get the HPD signal with some monitors. > @@ -1723,6 +1726,7 @@ int zynqmp_dp_probe(struct zynqmp_dpsub *dpsub) > if (!dp) > return -ENOMEM; > > + dp->magic = 0x0123456789abcdefULL; > dp->dev = &pdev->dev; > dp->dpsub = dpsub; > dp->status = connector_status_disconnected; > @@ -1839,4 +1843,5 @@ void zynqmp_dp_remove(struct zynqmp_dpsub *dpsub) > > zynqmp_dp_phy_exit(dp); > zynqmp_dp_reset(dp, true); > + dp->magic = 0xdeadbeefdeadbeefULL; > } -- Regards, Laurent Pinchart
Re: [PATCH] fbdev: Have CONFIG_FB_NOTIFY be tristate
Hi Florian, kernel test robot noticed the following build errors: [auto build test ERROR on drm-misc/drm-misc-next] [also build test ERROR on drm-tip/drm-tip linus/master v6.9-rc6 next-20240503] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Florian-Fainelli/fbdev-Have-CONFIG_FB_NOTIFY-be-tristate/20240504-033139 base: git://anongit.freedesktop.org/drm/drm-misc drm-misc-next patch link: https://lore.kernel.org/r/20240503192858.103640-1-florian.fainelli%40broadcom.com patch subject: [PATCH] fbdev: Have CONFIG_FB_NOTIFY be tristate config: powerpc-randconfig-001-20240504 (https://download.01.org/0day-ci/archive/20240504/202405041939.mnsiecv5-...@intel.com/config) compiler: powerpc-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240504/202405041939.mnsiecv5-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202405041939.mnsiecv5-...@intel.com/ All errors (new ones prefixed by >>): powerpc-linux-ld: drivers/leds/trigger/ledtrig-backlight.o: in function `bl_trig_deactivate': >> ledtrig-backlight.c:(.text+0x360): undefined reference to >> `fb_unregister_client' powerpc-linux-ld: drivers/leds/trigger/ledtrig-backlight.o: in function `bl_trig_activate': >> ledtrig-backlight.c:(.text+0x3e8): undefined reference to >> `fb_register_client' -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
[PATCH] drm/amdgpu: delete unnecessary check
The "ret" variable is zero. No need to check. Signed-off-by: Dan Carpenter --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index a037e8fba29f..4d50fb039509 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -2807,7 +2807,7 @@ static void amdgpu_ras_do_page_retirement(struct work_struct *work) static void amdgpu_ras_poison_creation_handler(struct amdgpu_device *adev, uint32_t timeout_ms) { - int ret = 0; + int ret; struct ras_ecc_log_info *ecc_log; struct ras_query_if info; uint32_t timeout = timeout_ms; @@ -2836,8 +2836,7 @@ static void amdgpu_ras_poison_creation_handler(struct amdgpu_device *adev, return; } - if (!ret) - schedule_delayed_work(&ras->page_retirement_dwork, 0); + schedule_delayed_work(&ras->page_retirement_dwork, 0); } static int amdgpu_ras_poison_consumption_handler(struct amdgpu_device *adev, -- 2.43.0
[PATCH] drm/amd/pm: Fix error code in vega10_hwmgr_backend_init()
Return -EINVAL on error instead of success. Also on the success path, return a literal zero instead of "return result;" Fixes: e098bc9612c2 ("drm/amd/pm: optimize the power related source code layout") Signed-off-by: Dan Carpenter --- drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c index 37c915d7723c..9b9f8615070a 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c @@ -924,7 +924,7 @@ static int vega10_hwmgr_backend_init(struct pp_hwmgr *hwmgr) data->total_active_cus = adev->gfx.cu_info.number; if (!hwmgr->not_vf) - return result; + return -EINVAL; /* Setup default Overdrive Fan control settings */ data->odn_fan_table.target_fan_speed = @@ -947,7 +947,7 @@ static int vega10_hwmgr_backend_init(struct pp_hwmgr *hwmgr) "Mem Channel Index Exceeded maximum!", return -EINVAL); - return result; + return 0; } static int vega10_init_sclk_threshold(struct pp_hwmgr *hwmgr) -- 2.43.0
Re: [PATCH] staging: fb_tinylcd Alignment to open parenthesis
On Fri, May 03, 2024 at 04:20:24PM +, Ashok Kumar wrote: > > Is there a list of exceptions to the checkpatch information that we can > ignore in general. > For Greg's subsystems ignore the warning about extra parentheses. You can search on lore for if a patch has been patch has been sent before. Otherwise ignore checkpatch if it tells you to do something that makes the code less readable. regards, dan carpenter
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Sat, May 04, 2024 at 12:39:00AM +0100, Al Viro wrote: > On Fri, May 03, 2024 at 04:16:15PM -0700, Linus Torvalds wrote: > > On Fri, 3 May 2024 at 15:07, Al Viro wrote: > > > > > > Suppose your program calls select() on a pipe and dmabuf, sees data to be > > > read > > > from pipe, reads it, closes both pipe and dmabuf and exits. > > > > > > Would you expect that dmabuf file would stick around for hell knows how > > > long > > > after that? I would certainly be very surprised by running into that... > > > > Why? > > > > That's the _point_ of refcounts. They make the thing they refcount > > stay around until it's no longer referenced. > > > > Now, I agree that dmabuf's are a bit odd in how they use a 'struct > > file' *as* their refcount, but hey, it's a specialty use. Unusual > > perhaps, but not exactly wrong. > > > > I suspect that if you saw a dmabuf just have its own 'refcount_t' and > > stay around until it was done, you wouldn't bat an eye at it, and it's > > really just the "it uses a struct file for counting" that you are > > reacting to. > > *IF* those files are on purely internal filesystem, that's probably > OK; do that with something on something mountable (char device, > sysfs file, etc.) and you have a problem with filesystem staying > busy. In this instance it is ok because dma-buf is an internal fs. I had the exact same reaction you had initially but it doesn't matter for dma-buf afaict as that thing can never be unmounted.
Re: get_file() unsafe under epoll (was Re: [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove)
On Fri, May 03, 2024 at 12:59:52PM -0700, Kees Cook wrote: > On Fri, May 03, 2024 at 01:35:09PM -0600, Jens Axboe wrote: > > On 5/3/24 1:22 PM, Kees Cook wrote: > > > On Fri, May 03, 2024 at 12:49:11PM -0600, Jens Axboe wrote: > > >> On 5/3/24 12:26 PM, Kees Cook wrote: > > >>> Thanks for doing this analysis! I suspect at least a start of a fix > > >>> would be this: > > >>> > > >>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c > > >>> index 8fe5aa67b167..15e8f74ee0f2 100644 > > >>> --- a/drivers/dma-buf/dma-buf.c > > >>> +++ b/drivers/dma-buf/dma-buf.c > > >>> @@ -267,9 +267,8 @@ static __poll_t dma_buf_poll(struct file *file, > > >>> poll_table *poll) > > >>> > > >>> if (events & EPOLLOUT) { > > >>> /* Paired with fput in dma_buf_poll_cb */ > > >>> - get_file(dmabuf->file); > > >>> - > > >>> - if (!dma_buf_poll_add_cb(resv, true, dcb)) > > >>> + if (!atomic_long_inc_not_zero(&dmabuf->file) && > > >>> + !dma_buf_poll_add_cb(resv, true, dcb)) > > >>> /* No callback queued, wake up any > > >>> other waiters */ > > >> > > >> Don't think this is sane at all. I'm assuming you meant: > > >> > > >> atomic_long_inc_not_zero(&dmabuf->file->f_count); > > > > > > Oops, yes, sorry. I was typed from memory instead of copy/paste. > > > > Figured :-) > > > > >> but won't fly as you're not under RCU in the first place. And what > > >> protects it from being long gone before you attempt this anyway? This is > > >> sane way to attempt to fix it, it's completely opposite of what sane ref > > >> handling should look like. > > >> > > >> Not sure what the best fix is here, seems like dma-buf should hold an > > >> actual reference to the file upfront rather than just stash a pointer > > >> and then later _hope_ that it can just grab a reference. That seems > > >> pretty horrible, and the real source of the issue. > > > > > > AFAICT, epoll just doesn't hold any references at all. It depends, > > > I think, on eventpoll_release() (really eventpoll_release_file()) > > > synchronizing with epoll_wait() (but I don't see how this happens, and > > > the race seems to be against ep_item_poll() ...?) > > > > > > I'm really confused about how eventpoll manages the lifetime of polled > > > fds. > > > > epoll doesn't hold any references, and it's got some ugly callback to > > deal with that. It's not ideal, nor pretty, but that's how it currently > > works. See eventpoll_release() and how it's called. This means that > > epoll itself is supposedly safe from the file going away, even though it > > doesn't hold a reference to it. > > Right -- what remains unclear to me is how struct file lifetime is > expected to work in the struct file_operations::poll callbacks. Because > using get_file() there looks clearly unsafe... If you're in ->poll() you're holding the epoll mutex and eventpoll_release_file() needs to acquire ep->mtx as well. So if you're in ->poll() then you know that eventpoll_release_file() can't progress and therefore eventpoll_release() can't make progress. So f_op->release() won't be able to be called as it happens after eventpoll_release() in __fput(). But f_count being able to go to zero is expected.
Re: get_file() unsafe under epoll (was Re: [syzbot] [fs?] [io-uring?] general protection fault in __ep_remove)
On Fri, May 03, 2024 at 11:26:32AM -0700, Kees Cook wrote: > On Fri, May 03, 2024 at 06:54:22PM +0700, Bui Quang Minh wrote: > > [...] > > Root cause: > > AFAIK, eventpoll (epoll) does not increase the registered file's reference. > > To ensure the safety, when the registered file is deallocated in __fput, > > eventpoll_release is called to unregister the file from epoll. When calling > > poll on epoll, epoll will loop through registered files and call vfs_poll on > > these files. In the file's poll, file is guaranteed to be alive, however, as > > epoll does not increase the registered file's reference, the file may be > > dying > > and it's not safe the get the file for later use. And dma_buf_poll violates > > this. In the dma_buf_poll, it tries to get_file to use in the callback. This > > leads to a race where the dmabuf->file can be fput twice. > > > > Here is the race occurs in the above proof-of-concept > > > > close(dmabuf->file) > > __fput_sync (f_count == 1, last ref) > > f_count-- (f_count == 0 now) > > __fput > > epoll_wait > > vfs_poll(dmabuf->file) > > get_file(dmabuf->file)(f_count == 1) > > eventpoll_release > > dmabuf->file deallocation > > fput(dmabuf->file) (f_count == 1) > > f_count-- > > dmabuf->file deallocation > > > > I am not familiar with the dma_buf so I don't know the proper fix for the > > issue. About the rule that don't get the file for later use in poll callback > > of > > file, I wonder if it is there when only select/poll exist or just after > > epoll > > appears. > > > > I hope the analysis helps us to fix the issue. > > Thanks for doing this analysis! I suspect at least a start of a fix > would be this: > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c > index 8fe5aa67b167..15e8f74ee0f2 100644 > --- a/drivers/dma-buf/dma-buf.c > +++ b/drivers/dma-buf/dma-buf.c > @@ -267,9 +267,8 @@ static __poll_t dma_buf_poll(struct file *file, > poll_table *poll) > > if (events & EPOLLOUT) { > /* Paired with fput in dma_buf_poll_cb */ > - get_file(dmabuf->file); > - > - if (!dma_buf_poll_add_cb(resv, true, dcb)) > + if (!atomic_long_inc_not_zero(&dmabuf->file) && > + !dma_buf_poll_add_cb(resv, true, dcb)) > /* No callback queued, wake up any other > waiters */ > dma_buf_poll_cb(NULL, &dcb->cb); > else > @@ -290,9 +289,8 @@ static __poll_t dma_buf_poll(struct file *file, > poll_table *poll) > > if (events & EPOLLIN) { > /* Paired with fput in dma_buf_poll_cb */ > - get_file(dmabuf->file); > - > - if (!dma_buf_poll_add_cb(resv, false, dcb)) > + if (!atomic_long_inc_not_zero(&dmabuf->file) && > + !dma_buf_poll_add_cb(resv, false, dcb)) > /* No callback queued, wake up any other > waiters */ > dma_buf_poll_cb(NULL, &dcb->cb); > else > > > But this ends up leaving "active" non-zero, and at close time it runs > into: > > BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active); > > But the bottom line is that get_file() is unsafe to use in some places, > one of which appears to be in the poll handler. There are maybe some > other fragile places too, like in drivers/gpu/drm/vmwgfx/ttm_object.c: > > static bool __must_check get_dma_buf_unless_doomed(struct dma_buf *dmabuf) > { > return atomic_long_inc_not_zero(&dmabuf->file->f_count) != 0L; > } > > Which I also note involves a dmabuf... > > Due to this issue I've proposed fixing get_file() to detect pathological > states: > https://lore.kernel.org/lkml/2024050252.work.690-k...@kernel.org/ > > But that has run into some push-back. I'm hoping that seeing this epoll > example will help illustrate what needs fixing a little better. > > I think the best current proposal is to just WARN sooner instead of a > full refcount_t implementation: > > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 8dfd53b52744..e09107d0a3d6 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1040,7 +1040,8 @@ struct file_handle { > > static inline struct file *get_file(struct file *f) > { > - atomic_long_inc(&f->f_count); > + long prior = atomic_long_fetch_inc_relaxed(&f->f_count); > + WARN_ONCE(!prior, "struct file::f_count incremented from zero; > use-after-free condition present!\n"); > return f; > } > > > > What's the right way to deal with the dmabuf situation? (And I suspect > it applies to get_dma_buf_unless_doomed() as well...) No, it doesn
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Fri, May 03, 2024 at 02:33:37PM -0700, Linus Torvalds wrote: > On Fri, 3 May 2024 at 14:24, Al Viro wrote: > > > > Can we get to ep_item_poll(epi, ...) after eventpoll_release_file() > > got past __ep_remove()? Because if we can, we have a worse problem - > > epi freed under us. > > Look at the hack in __ep_remove(): if it is concurrent with > eventpoll_release_file(), it will hit this code > > spin_lock(&file->f_lock); > if (epi->dying && !force) { > spin_unlock(&file->f_lock); > return false; > } > > and not free the epi. > > But as far as I can tell, almost nothing else cares about the f_lock > and dying logic. > > And in fact, I don't think doing > > spin_lock(&file->f_lock); > > is even valid in the places that look up file through "epi->ffd.file", > because the lock itself is inside the thing that you can't trust until > you've taken the lock... > > So I agree with Kees about the use of "atomic_dec_not_zero()" kind of > logic - but it also needs to be in an RCU-readlocked region, I think. Why isn't it enough to just force dma_buf_poll() to use get_file_active()? Then that whole problem goes away afaict. So the fix I had yesterday before I had to step away from the computer was literally just that [1]. It currently uses two atomic incs potentially but that can probably be fixed by the dma folks to be smarter about when they actually need to take a file reference. > > I wish epoll() just took the damn file ref itself. But since it relies > on the file refcount to release the data structure, that obviously > can't work. > > Linus diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 8fe5aa67b167..7149c45976e1 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -244,13 +244,18 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll) if (!dmabuf || !dmabuf->resv) return EPOLLERR; + if (!get_file_active(&dmabuf->file)) + return EPOLLERR; + resv = dmabuf->resv; poll_wait(file, &dmabuf->poll, poll); events = poll_requested_events(poll) & (EPOLLIN | EPOLLOUT); - if (!events) + if (!events) { + fput(file); return 0; + } dma_resv_lock(resv, NULL); @@ -268,7 +273,6 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll) if (events & EPOLLOUT) { /* Paired with fput in dma_buf_poll_cb */ get_file(dmabuf->file); - if (!dma_buf_poll_add_cb(resv, true, dcb)) /* No callback queued, wake up any other waiters */ dma_buf_poll_cb(NULL, &dcb->cb); @@ -301,6 +305,7 @@ static __poll_t dma_buf_poll(struct file *file, poll_table *poll) } dma_resv_unlock(resv); + fput(file); return events; }
Re: [PATCH] epoll: try to be a _bit_ better about file lifetimes
On Fri, May 03, 2024 at 04:41:19PM -0700, Linus Torvalds wrote: > On Fri, 3 May 2024 at 16:23, Kees Cook wrote: > > > > static bool __must_check get_dma_buf_unless_doomed(struct dma_buf *dmabuf) > > { > > return atomic_long_inc_not_zero(&dmabuf->file->f_count) != 0L; > > } > > > > If we end up adding epi_fget(), we'll have 2 cases of using > > "atomic_long_inc_not_zero" for f_count. Do we need some kind of blessed > > helper to live in file.h or something, with appropriate comments? > > I wonder if we could try to abstract this out a bit more. > > These games with non-ref-counted file structures *feel* a bit like the > games we play with non-ref-counted (aka "stashed") 'struct dentry' > that got fairly recently cleaned up with path_from_stashed() when both > nsfs and pidfs started doing the same thing. > > I'm not loving the TTM use of this thing, but at least the locking and > logic feels a lot more straightforward (ie the > atomic_long_inc_not_zero() here is clealy under the 'prime->mutex' > lock The TTM stuff is somewhat wild though and I've commented on that in https://lore.kernel.org/r/20240503-mitmachen-redakteur-2707ab0cacc3@brauner another thread that it can just use get_active_file(). Afaict, there's dma_buf_export() that allocates a new file and sets: file->private_data = dmabuf; dmabuf->file = file; dentry->d_fsdata = dmabuf; The file has f_op->release::dma_buf_file_release() as it's f_op->release method. When that's called the file's refcount is already zero but the file has not been freed yet. This will remove the dmabuf from some public list but it won't free it. dmabuf dentries have dma_buf_dentry_ops which use dentry->d_release::dma_buf_release() to release the actual dmabuf stashed in dentry->d_fsdata. So that ends up with: __fput() -> f_op->release::dma_buf_file_release() // handles file specific freeing -> dput() -> d_op->d_release::dma_buf_release() // handles dmabuf freeing // including the driver specific stuff. If you fput() the file then the dmabuf will be freed as well immediately after it when the dput() happens in __fput(). So that TTM thing does something else then in ttm_object_device_init(). It copies the dma_buf_ops into tdev->ops and replaces the dma_buf_ops release method with it's own ttm_prime_dmabuf_release() and stashes the old on in tdev->dma_buf_release. And it uses that to hook into the release path so that @dmabuf will still be valid for get_dma_buf_unless_doomed() under prime->mutex. But again, get_dma_buf_unless_doomed() can just be replaced with get_active_file() and then we're done with that part. > IOW, the tty use looks correct to me, and it has fairly simple locking > and is just catching the the race between 'fput()' decrementing the > refcount and and 'file->f_op->release()' doing the actual release. > > You are right that it's similar to the epoll thing in that sense, it > just looks a _lot_ more straightforward to me (and, unlike epoll, > doesn't look actively buggy right now). It's not buggy afaict. It literally can just switch to get_active_file() instead of open-coding it and we're done imho.