Re: [Freedreno] [PATCH v3] drm/msm/dp: check hpd_state before push idle pattern at dp_bridge_disable()

2022-08-18 Thread Abhinav Kumar

Hi Stephen

On 8/15/2022 10:08 AM, Stephen Boyd wrote:

Quoting Kuogee Hsieh (2022-08-11 08:20:01)


On 8/10/2022 6:00 PM, Abhinav Kumar wrote:


Even then, you do have a valid point. DRM framework should not have
caused the disable path to happen without an enable.

I went through the stack mentioned in the issue.

Lets see this part of the stack:

dp_ctrl_push_idle+0x40/0x88
  dp_bridge_disable+0x24/0x30
  drm_atomic_bridge_chain_disable+0x90/0xbc
  drm_atomic_helper_commit_modeset_disables+0x198/0x444
  msm_atomic_commit_tail+0x1d0/0x374

In drm_atomic_helper_commit_modeset_disables(), we call
disable_outputs().

AFAICT, this is the only place which has a protection to not call the
disable() flow if it was not enabled here:

https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_helper.c#L1063


But that function is only checking crtc_state->active. Thats set by
the usermode:

https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_uapi.c#L407


Now, if usermode sets that to true and then crashed it can bypass this
check and we will crash in the location kuogee is trying to fix.


That seems bad, no? We don't want userspace to be able to crash and then
be able to call the disable path when enable never succeeded.



 From the issue mentioned in
https://gitlab.freedesktop.org/drm/msm/-/issues/17, the reporter did
mention the usermode crashed.

So this is my tentative analysis of whats happening here.

Ideally yes, we should have been protected by the location mentioned
above in disable_outputs() but looks to me due to the above hypothesis
its getting bypassed.


Can you fix the problem there? Not fixing it means that every driver out
there has to develop the same "fix", when it could be fixed in the core
one time.



As per discussion on IRC with Rob, we have pushed another fix for this 
issue 
https://lore.kernel.org/all/1660759314-28088-1-git-send-email-quic_khs...@quicinc.com/.


So, we can drop this one in favor of the other.

Thanks

Abhinav

Ideally drivers are simple. They configure the hardware for what the
function pointer is asking for. State management and things like that
should be pushed into the core framework so that we don't have to
duplicate that multiple times.



Thanks

Abhinav



Ii sound likes that there is a hole either at user space or drm.

But that should not cause dp_bridge_disable() at dp driver to crash.


Agreed.



Therefore it is properly to check hdp_state condition at
dp_bridge_disable() to prevent it from crashing.



Disagree. Userspace shouldn't be able to get drm into a wedged state.


Re: [Freedreno] [PATCH v3] drm/msm/dp: check hpd_state before push idle pattern at dp_bridge_disable()

2022-08-16 Thread Kuogee Hsieh



On 8/15/2022 10:08 AM, Stephen Boyd wrote:

Quoting Kuogee Hsieh (2022-08-11 08:20:01)

On 8/10/2022 6:00 PM, Abhinav Kumar wrote:

Even then, you do have a valid point. DRM framework should not have
caused the disable path to happen without an enable.

I went through the stack mentioned in the issue.

Lets see this part of the stack:

dp_ctrl_push_idle+0x40/0x88
  dp_bridge_disable+0x24/0x30
  drm_atomic_bridge_chain_disable+0x90/0xbc
  drm_atomic_helper_commit_modeset_disables+0x198/0x444
  msm_atomic_commit_tail+0x1d0/0x374

In drm_atomic_helper_commit_modeset_disables(), we call
disable_outputs().

AFAICT, this is the only place which has a protection to not call the
disable() flow if it was not enabled here:

https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_helper.c#L1063


But that function is only checking crtc_state->active. Thats set by
the usermode:

https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_uapi.c#L407


Now, if usermode sets that to true and then crashed it can bypass this
check and we will crash in the location kuogee is trying to fix.

That seems bad, no? We don't want userspace to be able to crash and then
be able to call the disable path when enable never succeeded.


 From the issue mentioned in
https://gitlab.freedesktop.org/drm/msm/-/issues/17, the reporter did
mention the usermode crashed.

So this is my tentative analysis of whats happening here.

Ideally yes, we should have been protected by the location mentioned
above in disable_outputs() but looks to me due to the above hypothesis
its getting bypassed.

Can you fix the problem there? Not fixing it means that every driver out
there has to develop the same "fix", when it could be fixed in the core
one time.


The reporter is running GNOME Display Manager (gdm3) instead of chrome.

We can not duplicate this problem locally. Hence we can not confirm this 
is the root cause of this bug or not.


Do you know who is more proper to investigate more into this?


Ideally drivers are simple. They configure the hardware for what the
function pointer is asking for. State management and things like that
should be pushed into the core framework so that we don't have to
duplicate that multiple times.


Thanks

Abhinav



Ii sound likes that there is a hole either at user space or drm.

But that should not cause dp_bridge_disable() at dp driver to crash.

Agreed.


Therefore it is properly to check hdp_state condition at
dp_bridge_disable() to prevent it from crashing.


Disagree. Userspace shouldn't be able to get drm into a wedged state.


not sure for this.

I agree if this is simple driver such as i2c which does not need to 
maintain any states during operation.


but mdp/dp is far more complexity. we really do not want to have any 
crash happen at mdp/dp in the filed.


would you please reconsider our point to add this hdp_state checking 
here to prevent any crash happen.




Re: [Freedreno] [PATCH v3] drm/msm/dp: check hpd_state before push idle pattern at dp_bridge_disable()

2022-08-15 Thread Stephen Boyd
Quoting Kuogee Hsieh (2022-08-11 08:20:01)
>
> On 8/10/2022 6:00 PM, Abhinav Kumar wrote:
> >
> > Even then, you do have a valid point. DRM framework should not have
> > caused the disable path to happen without an enable.
> >
> > I went through the stack mentioned in the issue.
> >
> > Lets see this part of the stack:
> >
> > dp_ctrl_push_idle+0x40/0x88
> >  dp_bridge_disable+0x24/0x30
> >  drm_atomic_bridge_chain_disable+0x90/0xbc
> >  drm_atomic_helper_commit_modeset_disables+0x198/0x444
> >  msm_atomic_commit_tail+0x1d0/0x374
> >
> > In drm_atomic_helper_commit_modeset_disables(), we call
> > disable_outputs().
> >
> > AFAICT, this is the only place which has a protection to not call the
> > disable() flow if it was not enabled here:
> >
> > https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_helper.c#L1063
> >
> >
> > But that function is only checking crtc_state->active. Thats set by
> > the usermode:
> >
> > https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_uapi.c#L407
> >
> >
> > Now, if usermode sets that to true and then crashed it can bypass this
> > check and we will crash in the location kuogee is trying to fix.

That seems bad, no? We don't want userspace to be able to crash and then
be able to call the disable path when enable never succeeded.

> >
> > From the issue mentioned in
> > https://gitlab.freedesktop.org/drm/msm/-/issues/17, the reporter did
> > mention the usermode crashed.
> >
> > So this is my tentative analysis of whats happening here.
> >
> > Ideally yes, we should have been protected by the location mentioned
> > above in disable_outputs() but looks to me due to the above hypothesis
> > its getting bypassed.

Can you fix the problem there? Not fixing it means that every driver out
there has to develop the same "fix", when it could be fixed in the core
one time.

Ideally drivers are simple. They configure the hardware for what the
function pointer is asking for. State management and things like that
should be pushed into the core framework so that we don't have to
duplicate that multiple times.

> >
> > Thanks
> >
> > Abhinav
> >
> >
> Ii sound likes that there is a hole either at user space or drm.
>
> But that should not cause dp_bridge_disable() at dp driver to crash.

Agreed.

>
> Therefore it is properly to check hdp_state condition at
> dp_bridge_disable() to prevent it from crashing.
>

Disagree. Userspace shouldn't be able to get drm into a wedged state.


Re: [Freedreno] [PATCH v3] drm/msm/dp: check hpd_state before push idle pattern at dp_bridge_disable()

2022-08-11 Thread Kuogee Hsieh



On 8/10/2022 6:00 PM, Abhinav Kumar wrote:

Hi Stephen

On 8/10/2022 5:09 PM, Stephen Boyd wrote:

Quoting Kuogee Hsieh (2022-08-10 16:57:51)


On 8/10/2022 3:22 PM, Stephen Boyd wrote:

Quoting Kuogee Hsieh (2022-08-10 12:25:51)
diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c

index b36f8b6..678289a 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -1729,10 +1729,20 @@ void dp_bridge_disable(struct drm_bridge 
*drm_bridge)

  struct msm_dp_bridge *dp_bridge = to_dp_bridge(drm_bridge);
  struct msm_dp *dp = dp_bridge->dp_display;
  struct dp_display_private *dp_display;
+   u32 state;

  dp_display = container_of(dp, struct dp_display_private, 
dp_display);


+   mutex_lock(_display->event_mutex);
+
+   state = dp_display->hpd_state;
+   if (state != ST_DISCONNECT_PENDING && state != 
ST_CONNECTED) {

It's concerning that we have to check this at all. Are we still
interjecting into the disable path when the cable is disconnected?


yes,

The problem is not from cable disconnected.

There is a corner case that this function is called at drm shutdown
(drm_release).

At that time, mainlink is not enabled, hence dp_ctrl_push_idle() will
cause system crash.


The mainlink is only disabled when the cable is disconnected though?

Let me put it this way, if we have to check that the state is
"connected" or "disconnected pending" in the disable path then there's
an issue where this driver is being called in unexpected ways. This
driver is fighting the drm core each time there's a state check. We
really need to get rid of the state tracking entirely, and make sure
that the drm core is calling into the driver at the right time, i.e.
bridge disable is only called when the mainlink is enabled, etc.


So if link training failed, we do not send a uevent to usermode and 
will bail out here:


    rc = dp_ctrl_on_link(dp->ctrl);
    if (rc) {
    DRM_ERROR("failed to complete DP link training\n");
    goto end;
    }

So this commit is not coming from usermode but from the drm_release() 
path.


Even then, you do have a valid point. DRM framework should not have 
caused the disable path to happen without an enable.


I went through the stack mentioned in the issue.

Lets see this part of the stack:

dp_ctrl_push_idle+0x40/0x88
 dp_bridge_disable+0x24/0x30
 drm_atomic_bridge_chain_disable+0x90/0xbc
 drm_atomic_helper_commit_modeset_disables+0x198/0x444
 msm_atomic_commit_tail+0x1d0/0x374

In drm_atomic_helper_commit_modeset_disables(), we call 
disable_outputs().


AFAICT, this is the only place which has a protection to not call the 
disable() flow if it was not enabled here:


https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_helper.c#L1063 



But that function is only checking crtc_state->active. Thats set by 
the usermode:


https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_uapi.c#L407 



Now, if usermode sets that to true and then crashed it can bypass this 
check and we will crash in the location kuogee is trying to fix.


From the issue mentioned in 
https://gitlab.freedesktop.org/drm/msm/-/issues/17, the reporter did 
mention the usermode crashed.


So this is my tentative analysis of whats happening here.

Ideally yes, we should have been protected by the location mentioned 
above in disable_outputs() but looks to me due to the above hypothesis 
its getting bypassed.


Thanks

Abhinav



Ii sound likes that there is a hole either at user space or drm.

But that should not cause dp_bridge_disable() at dp driver to crash.

Therefore it is properly to check hdp_state condition at 
dp_bridge_disable() to prevent it from crashing.





Re: [Freedreno] [PATCH v3] drm/msm/dp: check hpd_state before push idle pattern at dp_bridge_disable()

2022-08-10 Thread Abhinav Kumar

Hi Stephen

On 8/10/2022 5:09 PM, Stephen Boyd wrote:

Quoting Kuogee Hsieh (2022-08-10 16:57:51)


On 8/10/2022 3:22 PM, Stephen Boyd wrote:

Quoting Kuogee Hsieh (2022-08-10 12:25:51)

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index b36f8b6..678289a 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -1729,10 +1729,20 @@ void dp_bridge_disable(struct drm_bridge *drm_bridge)
  struct msm_dp_bridge *dp_bridge = to_dp_bridge(drm_bridge);
  struct msm_dp *dp = dp_bridge->dp_display;
  struct dp_display_private *dp_display;
+   u32 state;

  dp_display = container_of(dp, struct dp_display_private, dp_display);

+   mutex_lock(_display->event_mutex);
+
+   state = dp_display->hpd_state;
+   if (state != ST_DISCONNECT_PENDING && state != ST_CONNECTED) {

It's concerning that we have to check this at all. Are we still
interjecting into the disable path when the cable is disconnected?


yes,

The problem is not from cable disconnected.

There is a corner case that this function is called at drm shutdown
(drm_release).

At that time, mainlink is not enabled, hence dp_ctrl_push_idle() will
cause system crash.


The mainlink is only disabled when the cable is disconnected though?

Let me put it this way, if we have to check that the state is
"connected" or "disconnected pending" in the disable path then there's
an issue where this driver is being called in unexpected ways. This
driver is fighting the drm core each time there's a state check. We
really need to get rid of the state tracking entirely, and make sure
that the drm core is calling into the driver at the right time, i.e.
bridge disable is only called when the mainlink is enabled, etc.


So if link training failed, we do not send a uevent to usermode and will 
bail out here:


rc = dp_ctrl_on_link(dp->ctrl);
if (rc) {
DRM_ERROR("failed to complete DP link training\n");
goto end;
}

So this commit is not coming from usermode but from the drm_release() path.

Even then, you do have a valid point. DRM framework should not have 
caused the disable path to happen without an enable.


I went through the stack mentioned in the issue.

Lets see this part of the stack:

dp_ctrl_push_idle+0x40/0x88
 dp_bridge_disable+0x24/0x30
 drm_atomic_bridge_chain_disable+0x90/0xbc
 drm_atomic_helper_commit_modeset_disables+0x198/0x444
 msm_atomic_commit_tail+0x1d0/0x374

In drm_atomic_helper_commit_modeset_disables(), we call disable_outputs().

AFAICT, this is the only place which has a protection to not call the 
disable() flow if it was not enabled here:


https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_helper.c#L1063

But that function is only checking crtc_state->active. Thats set by the 
usermode:


https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/drm_atomic_uapi.c#L407

Now, if usermode sets that to true and then crashed it can bypass this 
check and we will crash in the location kuogee is trying to fix.


From the issue mentioned in 
https://gitlab.freedesktop.org/drm/msm/-/issues/17, the reporter did 
mention the usermode crashed.


So this is my tentative analysis of whats happening here.

Ideally yes, we should have been protected by the location mentioned 
above in disable_outputs() but looks to me due to the above hypothesis 
its getting bypassed.


Thanks

Abhinav





Re: [Freedreno] [PATCH v3] drm/msm/dp: check hpd_state before push idle pattern at dp_bridge_disable()

2022-08-10 Thread Stephen Boyd
Quoting Kuogee Hsieh (2022-08-10 16:57:51)
>
> On 8/10/2022 3:22 PM, Stephen Boyd wrote:
> > Quoting Kuogee Hsieh (2022-08-10 12:25:51)
> >> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
> >> b/drivers/gpu/drm/msm/dp/dp_display.c
> >> index b36f8b6..678289a 100644
> >> --- a/drivers/gpu/drm/msm/dp/dp_display.c
> >> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
> >> @@ -1729,10 +1729,20 @@ void dp_bridge_disable(struct drm_bridge 
> >> *drm_bridge)
> >>  struct msm_dp_bridge *dp_bridge = to_dp_bridge(drm_bridge);
> >>  struct msm_dp *dp = dp_bridge->dp_display;
> >>  struct dp_display_private *dp_display;
> >> +   u32 state;
> >>
> >>  dp_display = container_of(dp, struct dp_display_private, 
> >> dp_display);
> >>
> >> +   mutex_lock(_display->event_mutex);
> >> +
> >> +   state = dp_display->hpd_state;
> >> +   if (state != ST_DISCONNECT_PENDING && state != ST_CONNECTED) {
> > It's concerning that we have to check this at all. Are we still
> > interjecting into the disable path when the cable is disconnected?
>
> yes,
>
> The problem is not from cable disconnected.
>
> There is a corner case that this function is called at drm shutdown
> (drm_release).
>
> At that time, mainlink is not enabled, hence dp_ctrl_push_idle() will
> cause system crash.

The mainlink is only disabled when the cable is disconnected though?

Let me put it this way, if we have to check that the state is
"connected" or "disconnected pending" in the disable path then there's
an issue where this driver is being called in unexpected ways. This
driver is fighting the drm core each time there's a state check. We
really need to get rid of the state tracking entirely, and make sure
that the drm core is calling into the driver at the right time, i.e.
bridge disable is only called when the mainlink is enabled, etc.


Re: [Freedreno] [PATCH v3] drm/msm/dp: check hpd_state before push idle pattern at dp_bridge_disable()

2022-08-10 Thread Kuogee Hsieh



On 8/10/2022 3:22 PM, Stephen Boyd wrote:

Quoting Kuogee Hsieh (2022-08-10 12:25:51)

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
b/drivers/gpu/drm/msm/dp/dp_display.c
index b36f8b6..678289a 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -1729,10 +1729,20 @@ void dp_bridge_disable(struct drm_bridge *drm_bridge)
 struct msm_dp_bridge *dp_bridge = to_dp_bridge(drm_bridge);
 struct msm_dp *dp = dp_bridge->dp_display;
 struct dp_display_private *dp_display;
+   u32 state;

 dp_display = container_of(dp, struct dp_display_private, dp_display);

+   mutex_lock(_display->event_mutex);
+
+   state = dp_display->hpd_state;
+   if (state != ST_DISCONNECT_PENDING && state != ST_CONNECTED) {

It's concerning that we have to check this at all. Are we still
interjecting into the disable path when the cable is disconnected?


yes,

The problem is not from cable disconnected.

There is a corner case that this function is called at drm shutdown 
(drm_release).


At that time, mainlink is not enabled, hence dp_ctrl_push_idle() will 
cause system crash.





+   mutex_unlock(_display->event_mutex);
+   return;
+   }
+
 dp_ctrl_push_idle(dp_display->ctrl);
+   mutex_unlock(_display->event_mutex);


Re: [Freedreno] [PATCH v3] drm/msm/dp: check hpd_state before push idle pattern at dp_bridge_disable()

2022-08-10 Thread Stephen Boyd
Quoting Kuogee Hsieh (2022-08-10 12:25:51)
> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
> b/drivers/gpu/drm/msm/dp/dp_display.c
> index b36f8b6..678289a 100644
> --- a/drivers/gpu/drm/msm/dp/dp_display.c
> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
> @@ -1729,10 +1729,20 @@ void dp_bridge_disable(struct drm_bridge *drm_bridge)
> struct msm_dp_bridge *dp_bridge = to_dp_bridge(drm_bridge);
> struct msm_dp *dp = dp_bridge->dp_display;
> struct dp_display_private *dp_display;
> +   u32 state;
>
> dp_display = container_of(dp, struct dp_display_private, dp_display);
>
> +   mutex_lock(_display->event_mutex);
> +
> +   state = dp_display->hpd_state;
> +   if (state != ST_DISCONNECT_PENDING && state != ST_CONNECTED) {

It's concerning that we have to check this at all. Are we still
interjecting into the disable path when the cable is disconnected?

> +   mutex_unlock(_display->event_mutex);
> +   return;
> +   }
> +
> dp_ctrl_push_idle(dp_display->ctrl);
> +   mutex_unlock(_display->event_mutex);