> -----Original Message----- > From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of > Chen, Guchun > Sent: Monday, October 21, 2019 10:29 PM > To: amd-gfx@lists.freedesktop.org; Koenig, Christian > <christian.koe...@amd.com>; Zhang, Hawking > <hawking.zh...@amd.com>; Li, Dennis <dennis...@amd.com>; > Grodzovsky, Andrey <andrey.grodzov...@amd.com>; Zhou1, Tao > <tao.zh...@amd.com> > Cc: Li, Candice <candice...@amd.com>; Chen, Guchun > <guchun.c...@amd.com> > Subject: [PATCH] drm/amdgpu: refine reboot debugfs operation in ras case > (v3) > > Ras reboot debugfs node allows user one easy control to avoid gpu recovery > hang problem and directly reboot system per card basis, after ras > uncorrectable error happens. However, it is one common entry, which > should get rid of ras_ctrl node and remove ip dependence when inputting by > user. So add one new auto_reboot node in ras debugfs dir to achieve this. > > v2: in commit mssage, add justification why ras reboot debugfs node is > needed. > v3: use debugfs_create_bool to create debugfs file for boolean value > > Signed-off-by: Guchun Chen <guchun.c...@amd.com>
Reviewed-by: Alex Deucher <alexander.deuc...@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 19 ++++++++++++------- > 1 file changed, 12 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > index 6220394521e4..2d9e13d2a71a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > @@ -153,8 +153,6 @@ static int > amdgpu_ras_debugfs_ctrl_parse_data(struct file *f, > op = 1; > else if (sscanf(str, "inject %32s %8s", block_name, err) == 2) > op = 2; > - else if (sscanf(str, "reboot %32s", block_name) == 1) > - op = 3; > else if (str[0] && str[1] && str[2] && str[3]) > /* ascii string, but commands are not matched. */ > return -EINVAL; > @@ -218,12 +216,11 @@ static struct ras_manager > *amdgpu_ras_find_obj(struct amdgpu_device *adev, > * value to the address. > * > * Second member: struct ras_debug_if::op. > - * It has four kinds of operations. > + * It has three kinds of operations. > * > * - 0: disable RAS on the block. Take ::head as its data. > * - 1: enable RAS on the block. Take ::head as its data. > * - 2: inject errors on the block. Take ::inject as its data. > - * - 3: reboot on unrecoverable error > * > * How to use the interface? > * programs: > @@ -305,9 +302,6 @@ static ssize_t amdgpu_ras_debugfs_ctrl_write(struct > file *f, const char __user * > /* data.inject.address is offset instead of absolute gpu > address */ > ret = amdgpu_ras_error_inject(adev, &data.inject); > break; > - case 3: > - amdgpu_ras_get_context(adev)->reboot = true; > - break; > default: > ret = -EINVAL; > break; > @@ -1037,6 +1031,17 @@ static void > amdgpu_ras_debugfs_create_ctrl_node(struct amdgpu_device *adev) > adev, &amdgpu_ras_debugfs_ctrl_ops); > debugfs_create_file("ras_eeprom_reset", S_IWUGO | S_IRUGO, > con->dir, > adev, &amdgpu_ras_debugfs_eeprom_ops); > + > + /* > + * After one uncorrectable error happens, usually GPU recovery will > + * be scheduled. But due to the known problem in GPU recovery > failing > + * to bring GPU back, below interface provides one direct way to > + * user to reboot system automatically in such case within > + * ERREVENT_ATHUB_INTERRUPT generated. Normal GPU recovery > routine > + * will never be called. > + */ > + debugfs_create_bool("auto_reboot", S_IWUGO | S_IRUGO, con- > >dir, > + &con->reboot); > } > > void amdgpu_ras_debugfs_create(struct amdgpu_device *adev, > -- > 2.17.1 > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx