Re: [Freedreno] [PATCH 2/2] drm/msm: make msm_disp_state transient data struct
On 2021-04-26 15:14, Dmitry Baryshkov wrote: On Tue, 27 Apr 2021 at 01:03, wrote: On 2021-04-26 14:23, Dmitry Baryshkov wrote: > On 26/04/2021 23:50, abhin...@codeaurora.org wrote: >> On 2021-04-25 09:08, Dmitry Baryshkov wrote: >>> Instead of allocating snapshotting structure at the driver probe time >>> and later handling concurrent access, actual state, etc, make >>> msm_disp_state transient struct. Allocate one when snapshotting >>> happens >>> and free it after coredump data is read by userspace. >> >> the purpose of the mutex is that when there are two coredumps being >> triggered >> by two consecutive errors, we want to make sure that till one coredump >> is completely >> read and/or processed, we do not allow triggering of another one as we >> want to capture >> the accurate hardware/software state. >> >> So moving disp_state out of kms might be okay and just allocated it >> when actually capturing the state >> but we will need the mutex and some sort of pending flag in my >> opinion. > > I'll return the mutex to the kms struct, so that we won't start > another snapshot untill previous one is complete. I think mutex should remain as part of snapshot so that they go together. Since this mutex is not used by any other module, I thought its better to keep it there. I don't think so: we will serialize access to the snapshot, but not dumping between snapshots. Note, that your code also serializes writing to snapshot, not reading from it. With disp_state being allocated on demand we do not have to protect its contents (since it is not available before registration using dev_codedumpm). I thought that you wanted to be sure that one snapshot is fully captured before next error triggers the next snapshot. And for this we'd need 'global' mutex (in kms). True, the intention of the mutex is to serialize access to snapshot. the coredump_pending flag is the one which will serialize the write and read because its set to false only in the free() method of the coredump. I was referring to the pending flag when I meant that it will serialize read/write. Regarding the mutex, well, if you word it that way that it protects one snapshot from being overwritten by the other and hence we need a 'global' mutex in kms, it convinced me :) The only reason it was within the disp_state itself was because its not used outside the module but I am okay with moving it to kms if Rob is okay with that. > Concerning the pending flag, I think it is also handled by the > coredump code: dev_coredumpm() will not create another device if there > is one already associated with the device being dumped. I should > probably mention this in the commit message. Thats a good point, I checked that now as well but I still think we need this flag because it also makes sure devcoredump is taken and read together within our driver itself instead of relying on the devcoredump. Its not a strong preference. But if you would like to go ahead with this, might have to retest its robustness. With the older logic how i verified this was that I relaxed the underrun cnt check in this patch ( https://patchwork.freedesktop.org/patch/429112/?series=89181=1 ) here and simulated an error: @@ -1336,6 +1337,11 @@ static void dpu_encoder_underrun_callback(struct drm_encoder *drm_enc, DPU_ATRACE_BEGIN("encoder_underrun_callback"); atomic_inc(_enc->underrun_cnt); + + /* trigger dump only on the first underrun */ + if (atomic_read(_enc->underrun_cnt) == 1) + msm_disp_snapshot_state(drm_enc->dev); + And verified that across various underrun interrupts, the devcoredump is stable. I'll try testing it this way, thank you for the pointer! > > If you agree with this, I'll send v2 then (also adding plls dumping). Looking fwd to seeing the pll dumping, that will be a great addition to this. > >> >>> >>> Signed-off-by: Dmitry Baryshkov >>> --- >>> drivers/gpu/drm/msm/disp/msm_disp_snapshot.c | 87 >>> --- >>> drivers/gpu/drm/msm/disp/msm_disp_snapshot.h | 13 +-- >>> .../gpu/drm/msm/disp/msm_disp_snapshot_util.c | 5 +- >>> drivers/gpu/drm/msm/msm_kms.h | 5 +- >>> 4 files changed, 28 insertions(+), 82 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c >>> b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c >>> index 70fd5a1fe13e..371358893716 100644 >>> --- a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c >>> +++ b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c >>> @@ -7,8 +7,7 @@ >>> >>> #include "msm_disp_snapshot.h" >>> >>> -#ifdef CONFIG_DEV_COREDUMP >>> -static ssize_t disp_devcoredump_read(char *buffer, loff_t offset, >>> +static ssize_t __maybe_unused disp_devcoredump_read(char *buffer, >>> loff_t offset, >>> size_t count, void *data, size_t datalen) >>> { >>> struct drm_print_iterator iter; >>> @@ -29,24 +28,21 @@ static ssize_t disp_devcoredump_read(char >>> *buffer, >>> loff_t offset, >>> return count -
Re: [Freedreno] [PATCH 2/2] drm/msm: make msm_disp_state transient data struct
On Tue, 27 Apr 2021 at 01:03, wrote: > > On 2021-04-26 14:23, Dmitry Baryshkov wrote: > > On 26/04/2021 23:50, abhin...@codeaurora.org wrote: > >> On 2021-04-25 09:08, Dmitry Baryshkov wrote: > >>> Instead of allocating snapshotting structure at the driver probe time > >>> and later handling concurrent access, actual state, etc, make > >>> msm_disp_state transient struct. Allocate one when snapshotting > >>> happens > >>> and free it after coredump data is read by userspace. > >> > >> the purpose of the mutex is that when there are two coredumps being > >> triggered > >> by two consecutive errors, we want to make sure that till one coredump > >> is completely > >> read and/or processed, we do not allow triggering of another one as we > >> want to capture > >> the accurate hardware/software state. > >> > >> So moving disp_state out of kms might be okay and just allocated it > >> when actually capturing the state > >> but we will need the mutex and some sort of pending flag in my > >> opinion. > > > > I'll return the mutex to the kms struct, so that we won't start > > another snapshot untill previous one is complete. > > I think mutex should remain as part of snapshot so that they go > together. Since this mutex is not used > by any other module, I thought its better to keep it there. I don't think so: we will serialize access to the snapshot, but not dumping between snapshots. Note, that your code also serializes writing to snapshot, not reading from it. With disp_state being allocated on demand we do not have to protect its contents (since it is not available before registration using dev_codedumpm). I thought that you wanted to be sure that one snapshot is fully captured before next error triggers the next snapshot. And for this we'd need 'global' mutex (in kms). > > Concerning the pending flag, I think it is also handled by the > > coredump code: dev_coredumpm() will not create another device if there > > is one already associated with the device being dumped. I should > > probably mention this in the commit message. > > Thats a good point, I checked that now as well but I still think we need > this flag because > it also makes sure devcoredump is taken and read together within our > driver itself instead of relying > on the devcoredump. Its not a strong preference. > > But if you would like to go ahead with this, might have to retest its > robustness. > With the older logic how i verified this was that I relaxed the underrun > cnt check in this patch > ( https://patchwork.freedesktop.org/patch/429112/?series=89181=1 ) > here and simulated an error: > > @@ -1336,6 +1337,11 @@ static void dpu_encoder_underrun_callback(struct > drm_encoder *drm_enc, > > DPU_ATRACE_BEGIN("encoder_underrun_callback"); > atomic_inc(_enc->underrun_cnt); > + > + /* trigger dump only on the first underrun */ > + if (atomic_read(_enc->underrun_cnt) == 1) > + msm_disp_snapshot_state(drm_enc->dev); > + > > And verified that across various underrun interrupts, the devcoredump is > stable. I'll try testing it this way, thank you for the pointer! > > > > > If you agree with this, I'll send v2 then (also adding plls dumping). > Looking fwd to seeing the pll dumping, that will be a great addition to > this. > > > >> > >>> > >>> Signed-off-by: Dmitry Baryshkov > >>> --- > >>> drivers/gpu/drm/msm/disp/msm_disp_snapshot.c | 87 > >>> --- > >>> drivers/gpu/drm/msm/disp/msm_disp_snapshot.h | 13 +-- > >>> .../gpu/drm/msm/disp/msm_disp_snapshot_util.c | 5 +- > >>> drivers/gpu/drm/msm/msm_kms.h | 5 +- > >>> 4 files changed, 28 insertions(+), 82 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c > >>> b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c > >>> index 70fd5a1fe13e..371358893716 100644 > >>> --- a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c > >>> +++ b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c > >>> @@ -7,8 +7,7 @@ > >>> > >>> #include "msm_disp_snapshot.h" > >>> > >>> -#ifdef CONFIG_DEV_COREDUMP > >>> -static ssize_t disp_devcoredump_read(char *buffer, loff_t offset, > >>> +static ssize_t __maybe_unused disp_devcoredump_read(char *buffer, > >>> loff_t offset, > >>> size_t count, void *data, size_t datalen) > >>> { > >>> struct drm_print_iterator iter; > >>> @@ -29,24 +28,21 @@ static ssize_t disp_devcoredump_read(char > >>> *buffer, > >>> loff_t offset, > >>> return count - iter.remain; > >>> } > >>> > >>> -static void disp_devcoredump_free(void *data) > >>> +static void _msm_disp_snapshot_work(struct kthread_work *work) > >>> { > >>> +struct msm_kms *msm_kms = container_of(work, struct msm_kms, > >>> dump_work); > >>> +struct drm_device *drm_dev = msm_kms->dev; > >>> struct msm_disp_state *disp_state; > >>> +struct drm_printer p; > >>> > >>> -disp_state = data; > >>> - > >>> -msm_disp_state_free(disp_state); > >>> - > >>> -disp_state->coredump_pending = false;
Re: [Freedreno] [PATCH 2/2] drm/msm: make msm_disp_state transient data struct
On 2021-04-26 14:23, Dmitry Baryshkov wrote: On 26/04/2021 23:50, abhin...@codeaurora.org wrote: On 2021-04-25 09:08, Dmitry Baryshkov wrote: Instead of allocating snapshotting structure at the driver probe time and later handling concurrent access, actual state, etc, make msm_disp_state transient struct. Allocate one when snapshotting happens and free it after coredump data is read by userspace. the purpose of the mutex is that when there are two coredumps being triggered by two consecutive errors, we want to make sure that till one coredump is completely read and/or processed, we do not allow triggering of another one as we want to capture the accurate hardware/software state. So moving disp_state out of kms might be okay and just allocated it when actually capturing the state but we will need the mutex and some sort of pending flag in my opinion. I'll return the mutex to the kms struct, so that we won't start another snapshot untill previous one is complete. I think mutex should remain as part of snapshot so that they go together. Since this mutex is not used by any other module, I thought its better to keep it there. Concerning the pending flag, I think it is also handled by the coredump code: dev_coredumpm() will not create another device if there is one already associated with the device being dumped. I should probably mention this in the commit message. Thats a good point, I checked that now as well but I still think we need this flag because it also makes sure devcoredump is taken and read together within our driver itself instead of relying on the devcoredump. Its not a strong preference. But if you would like to go ahead with this, might have to retest its robustness. With the older logic how i verified this was that I relaxed the underrun cnt check in this patch ( https://patchwork.freedesktop.org/patch/429112/?series=89181=1 ) here and simulated an error: @@ -1336,6 +1337,11 @@ static void dpu_encoder_underrun_callback(struct drm_encoder *drm_enc, DPU_ATRACE_BEGIN("encoder_underrun_callback"); atomic_inc(_enc->underrun_cnt); + + /* trigger dump only on the first underrun */ + if (atomic_read(_enc->underrun_cnt) == 1) + msm_disp_snapshot_state(drm_enc->dev); + And verified that across various underrun interrupts, the devcoredump is stable. If you agree with this, I'll send v2 then (also adding plls dumping). Looking fwd to seeing the pll dumping, that will be a great addition to this. Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/msm/disp/msm_disp_snapshot.c | 87 --- drivers/gpu/drm/msm/disp/msm_disp_snapshot.h | 13 +-- .../gpu/drm/msm/disp/msm_disp_snapshot_util.c | 5 +- drivers/gpu/drm/msm/msm_kms.h | 5 +- 4 files changed, 28 insertions(+), 82 deletions(-) diff --git a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c index 70fd5a1fe13e..371358893716 100644 --- a/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c +++ b/drivers/gpu/drm/msm/disp/msm_disp_snapshot.c @@ -7,8 +7,7 @@ #include "msm_disp_snapshot.h" -#ifdef CONFIG_DEV_COREDUMP -static ssize_t disp_devcoredump_read(char *buffer, loff_t offset, +static ssize_t __maybe_unused disp_devcoredump_read(char *buffer, loff_t offset, size_t count, void *data, size_t datalen) { struct drm_print_iterator iter; @@ -29,24 +28,21 @@ static ssize_t disp_devcoredump_read(char *buffer, loff_t offset, return count - iter.remain; } -static void disp_devcoredump_free(void *data) +static void _msm_disp_snapshot_work(struct kthread_work *work) { + struct msm_kms *msm_kms = container_of(work, struct msm_kms, dump_work); + struct drm_device *drm_dev = msm_kms->dev; struct msm_disp_state *disp_state; + struct drm_printer p; - disp_state = data; - - msm_disp_state_free(disp_state); - - disp_state->coredump_pending = false; -} -#endif /* CONFIG_DEV_COREDUMP */ + disp_state = kzalloc(sizeof(struct msm_disp_state), GFP_KERNEL); + if (!disp_state) + return; -static void _msm_disp_snapshot_work(struct kthread_work *work) -{ - struct msm_disp_state *disp_state = container_of(work, struct msm_disp_state, dump_work); - struct drm_printer p; + disp_state->dev = drm_dev->dev; + disp_state->drm_dev = drm_dev; - mutex_lock(_state->mutex); + INIT_LIST_HEAD(_state->blocks); msm_disp_snapshot_capture_state(disp_state); @@ -55,26 +51,15 @@ static void _msm_disp_snapshot_work(struct kthread_work *work) msm_disp_state_print(disp_state, ); } - /* - * if devcoredump is not defined free the state immediately - * otherwise it will be freed in the free handler. - */ -#ifdef CONFIG_DEV_COREDUMP + /* If COREDUMP is disabled, the stub will call the free function. */ This is a good cleanup, I just checked that the free() is called even if the config is not enabled