Re: [PATCH] drm/amd/powerplay: Remove unnecessary comparison statement

2019-09-09 Thread Wang, Kevin(Yang)
Reviewed-by: Kevin Wang 

Best Regards,
Kevin

From: amd-gfx  on behalf of Austin Kim 

Sent: Monday, September 9, 2019 12:31 PM
To: Deucher, Alexander ; airl...@linux.ie 
; dan...@ffwll.ch 
Cc: Zhou, David(ChunMing) ; amd-gfx@lists.freedesktop.org 
; linux-ker...@vger.kernel.org 
; dri-de...@lists.freedesktop.org 
; Koenig, Christian 
Subject: [PATCH] drm/amd/powerplay: Remove unnecessary comparison statement

size contain non-negative value since it is declared as uint32_t.
So below statement is always false.
if (size < 0)

Remove unnecessary comparison.

Signed-off-by: Austin Kim 
---
 drivers/gpu/drm/amd/powerplay/navi10_ppt.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/navi10_ppt.c 
b/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
index 12c0e46..3c7c68e 100644
--- a/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/navi10_ppt.c
@@ -1134,9 +1134,6 @@ static int navi10_set_power_profile_mode(struct 
smu_context *smu, long *input, u
 }

 if (smu->power_profile_mode == PP_SMC_POWER_PROFILE_CUSTOM) {
-   if (size < 0)
-   return -EINVAL;
-
 ret = smu_update_table(smu,
SMU_TABLE_ACTIVITY_MONITOR_COEFF, 
WORKLOAD_PPLIB_CUSTOM_BIT,
(void *)(&activity_monitor), false);
--
2.6.2

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amd/powerplay: Add the interface for geting dpm current power state

2019-09-09 Thread Quan, Evan
Reviewed-by: Evan Quan 

> -Original Message-
> From: Liang, Prike 
> Sent: 2019年9月9日 13:22
> To: amd-gfx@lists.freedesktop.org
> Cc: Quan, Evan ; Feng, Kenneth
> ; Huang, Ray ; Liu, Aaron
> ; Liang, Prike 
> Subject: [PATCH] drm/amd/powerplay: Add the interface for geting dpm
> current power state
> 
> implement the sysfs power_dpm_state
> 
> Signed-off-by: Prike Liang 
> ---
>  drivers/gpu/drm/amd/powerplay/renoir_ppt.c | 34
> ++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> b/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> index a5cf846..2c22ba4 100644
> --- a/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> +++ b/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> @@ -246,6 +246,38 @@ static int renoir_print_clk_levels(struct smu_context
> *smu,
>   return size;
>  }
> 
> +static enum amd_pm_state_type renoir_get_current_power_state(struct
> +smu_context *smu) {
> + enum amd_pm_state_type pm_type;
> + struct smu_dpm_context *smu_dpm_ctx = &(smu->smu_dpm);
> +
> + if (!smu_dpm_ctx->dpm_context ||
> + !smu_dpm_ctx->dpm_current_power_state)
> + return -EINVAL;
> +
> + mutex_lock(&(smu->mutex));
> + switch (smu_dpm_ctx->dpm_current_power_state-
> >classification.ui_label) {
> + case SMU_STATE_UI_LABEL_BATTERY:
> + pm_type = POWER_STATE_TYPE_BATTERY;
> + break;
> + case SMU_STATE_UI_LABEL_BALLANCED:
> + pm_type = POWER_STATE_TYPE_BALANCED;
> + break;
> + case SMU_STATE_UI_LABEL_PERFORMANCE:
> + pm_type = POWER_STATE_TYPE_PERFORMANCE;
> + break;
> + default:
> + if (smu_dpm_ctx->dpm_current_power_state-
> >classification.flags & SMU_STATE_CLASSIFICATION_FLAG_BOOT)
> + pm_type = POWER_STATE_TYPE_INTERNAL_BOOT;
> + else
> + pm_type = POWER_STATE_TYPE_DEFAULT;
> + break;
> + }
> + mutex_unlock(&(smu->mutex));
> +
> + return pm_type;
> +}
> +
>  static const struct pptable_funcs renoir_ppt_funcs = {
>   .get_smu_msg_index = renoir_get_smu_msg_index,
>   .get_smu_table_index = renoir_get_smu_table_index, @@ -253,6
> +285,8 @@ static const struct pptable_funcs renoir_ppt_funcs = {
>   .set_power_state = NULL,
>   .get_dpm_uclk_limited = renoir_get_dpm_uclk_limited,
>   .print_clk_levels = renoir_print_clk_levels,
> + .get_current_power_state = renoir_get_current_power_state,
> +
>  };
> 
>  void renoir_set_ppt_funcs(struct smu_context *smu)
> --
> 2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: gnome-shell stuck because of amdgpu driver [5.3 RC5]

2019-09-09 Thread Koenig, Christian
I agree with Daniels analysis.

It looks like the problem is simply that PM turns of a block before all 
work is done on that block.

Have you opened a bug report yet? If not then that would certainly help 
cause it is really hard to extract all necessary information from that 
mail thread.

Regards,
Christian.

Am 08.09.19 um 23:24 schrieb Mikhail Gavrilov:
> On Thu, 5 Sep 2019 at 12:58, Daniel Vetter  wrote:
>> I think those fences are only emitted for CS, not display related.
>> Adding Christian König.
> More fresh kernel log with 5.3RC7 - the issue still happens.
> https://pastebin.com/tyxkWJYV
>
>
> --
> Best Regards,
> Mike Gavrilov.
>
> On Thu, 5 Sep 2019 at 12:58, Daniel Vetter  wrote:
>> On Thu, Sep 5, 2019 at 12:27 AM Mikhail Gavrilov
>>  wrote:
>>> On Wed, 4 Sep 2019 at 13:37, Daniel Vetter  wrote:
 Extend your backtrac warning slightly like

  WARN(r, "we're stuck on fence %pS\n", fence->ops);

 Also adding Harry and Alex, I'm not really working on amdgpu ...
>>> [ 3511.998320] [ cut here ]
>>> [ 3511.998714] we're stuck on fence
>>> amdgpu_fence_ops+0x0/0xc220 [amdgpu]$
>> I think those fences are only emitted for CS, not display related.
>> Adding Christian König.
>> -Daniel
>>
>>> [ 3511.998991] WARNING: CPU: 10 PID: 1811 at
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c:332
>>> amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu]
>>> [ 3511.999009] Modules linked in: rfcomm fuse xt_CHECKSUM
>>> xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc
>>> nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT
>>> nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack
>>> ebtable_nat ip6table_nat ip6table_mangle ip6table_raw
>>> ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw
>>> iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c
>>> ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables
>>> iptable_filter cmac bnep sunrpc vfat fat edac_mce_amd kvm_amd
>>> snd_hda_codec_realtek rtwpci snd_hda_codec_generic kvm ledtrig_audio
>>> snd_hda_codec_hdmi uvcvideo rtw88 videobuf2_vmalloc snd_hda_intel
>>> videobuf2_memops videobuf2_v4l2 irqbypass snd_usb_audio snd_hda_codec
>>> videobuf2_common crct10dif_pclmul snd_usbmidi_lib crc32_pclmul
>>> mac80211 snd_rawmidi videodev snd_hda_core ghash_clmulni_intel btusb
>>> snd_hwdep btrtl snd_seq btbcm btintel snd_seq_device eeepc_wmi
>>> bluetooth xpad joydev mc snd_pcm
>>> [ 3511.999076]  asus_wmi ff_memless cfg80211 sparse_keymap video
>>> wmi_bmof ecdh_generic snd_timer ecc sp5100_tco k10temp snd i2c_piix4
>>> ccp rfkill soundcore libarc4 gpio_amdpt gpio_generic acpi_cpufreq
>>> binfmt_misc ip_tables hid_logitech_hidpp hid_logitech_dj amdgpu
>>> amd_iommu_v2 gpu_sched ttm drm_kms_helper drm crc32c_intel igb dca
>>> nvme i2c_algo_bit nvme_core wmi pinctrl_amd
>>> [ 3511.999126] CPU: 10 PID: 1811 Comm: Xorg Not tainted
>>> 5.3.0-0.rc6.git2.1c.fc32.x86_64 #1
>>> [ 3511.999131] Hardware name: System manufacturer System Product
>>> Name/ROG STRIX X470-I GAMING, BIOS 2703 08/20/2019
>>> [ 3511.999253] RIP: 0010:amdgpu_fence_wait_empty+0x1c6/0x240 [amdgpu]
>>> [ 3511.999278] Code: fe ff ff 31 c0 c3 48 89 ef e8 36 29 04 cb 84 c0
>>> 74 08 48 89 ef e8 8a a9 21 cb 48 8b 75 08 48 c7 c7 2c 16 86 c0 e8 82
>>> b8 b9 ca <0f> 0b b8 ea ff ff ff 5d c3 e8 ec 57 c3 ca 84 c0 0f 85 6f ff
>>> ff ff
>>> [ 3511.999282] RSP: 0018:b9c04170f798 EFLAGS: 00210282
>>> [ 3511.999288] RAX:  RBX: 8d2ce5205a80 RCX: 
>>> 0006
>>> [ 3511.999292] RDX: 0007 RSI: 8d2c5bea4070 RDI: 
>>> 8d2cfb5d9e00
>>> [ 3511.999296] RBP: 8d28becae480 R08: 0331b36fd503 R09: 
>>> 
>>> [ 3511.999299] R10:  R11:  R12: 
>>> 8d2ce520
>>> [ 3511.999303] R13:  R14:  R15: 
>>> 8d2ce154
>>> [ 3511.999308] FS:  7f59a5bc6f00() GS:8d2cfb40()
>>> knlGS:
>>> [ 3511.999311] CS:  0010 DS:  ES:  CR0: 80050033
>>> [ 3511.999315] CR2: 1108bc475960 CR3: 00075bf32000 CR4: 
>>> 003406e0
>>> [ 3511.999319] Call Trace:
>>> [ 3511.999394]  amdgpu_pm_compute_clocks+0x70/0x5f0 [amdgpu]
>>> [ 3511.999503]  dm_pp_apply_display_requirements+0x1a8/0x1c0 [amdgpu]
>>> [ 3511.999609]  dce12_update_clocks+0xd8/0x110 [amdgpu]
>>> [ 3511.999712]  dc_commit_state+0x414/0x590 [amdgpu]
>>> [ 3511.999725]  ? find_held_lock+0x32/0x90
>>> [ 3511.999832]  amdgpu_dm_atomic_commit_tail+0xd18/0x1cf0 [amdgpu]
>>> [ 3511.999844]  ? reacquire_held_locks+0xed/0x210
>>> [ 3511.999859]  ? ttm_eu_backoff_reservation+0xa5/0x160 [ttm]
>>> [ 3511.999866]  ? find_held_lock+0x32/0x90
>>> [ 3511.999872]  ? find_held_lock+0x32/0x90
>>> [ 3511.999881]  ? __lock_acquire+0x247/0x1910
>>> [ 3511.999893]  ? find_held_lock+0x32/0x90
>>> [ 3511.01]  ? mark_held_locks+0x50/0x80
>>> [ 3511.07]  ? _raw_spin_unlock_irq+0

Re: [PATCH] drm: add drm device name

2019-09-09 Thread Jani Nikula
On Sat, 07 Sep 2019, Daniel Vetter  wrote:
> On Sat, Sep 7, 2019 at 3:18 AM Rob Clark  wrote:
>>
>> On Fri, Sep 6, 2019 at 3:16 PM Marek Olšák  wrote:
>> >
>> > + dri-devel
>> >
>> > On Tue, Sep 3, 2019 at 5:41 PM Jiang, Sonny  wrote:
>> >>
>> >> Add DRM device name and use DRM_IOCTL_VERSION ioctl drmVersion::desc 
>> >> passing it to user space
>> >> instead of unused DRM driver name descriptor.
>> >>
>> >> Change-Id: I809f6d3e057111417efbe8fa7cab8f0113ba4b21
>> >> Signed-off-by: Sonny Jiang 
>> >> ---
>> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 ++
>> >>  drivers/gpu/drm/drm_drv.c  | 17 +
>> >>  drivers/gpu/drm/drm_ioctl.c|  2 +-
>> >>  include/drm/drm_device.h   |  3 +++
>> >>  include/drm/drm_drv.h  |  1 +
>> >>  5 files changed, 24 insertions(+), 1 deletion(-)
>> >>
>> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> >> index 67b09cb2a9e2..8f0971cea363 100644
>> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> >> @@ -2809,6 +2809,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>> >> /* init the mode config */
>> >> drm_mode_config_init(adev->ddev);
>> >>
>> >> +   drm_dev_set_name(adev->ddev, amdgpu_asic_name[adev->asic_type]);
>> >> +
>> >> r = amdgpu_device_ip_init(adev);
>> >> if (r) {
>> >> /* failed in exclusive mode due to timeout */
>> >> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>> >> index 862621494a93..6c33879bb538 100644
>> >> --- a/drivers/gpu/drm/drm_drv.c
>> >> +++ b/drivers/gpu/drm/drm_drv.c
>> >> @@ -802,6 +802,7 @@ void drm_dev_fini(struct drm_device *dev)
>> >> mutex_destroy(&dev->struct_mutex);
>> >> drm_legacy_destroy_members(dev);
>> >> kfree(dev->unique);
>> >> +   kfree(dev->name);
>> >>  }
>> >>  EXPORT_SYMBOL(drm_dev_fini);
>> >>
>> >> @@ -1078,6 +1079,22 @@ int drm_dev_set_unique(struct drm_device *dev, 
>> >> const char *name)
>> >>  }
>> >>  EXPORT_SYMBOL(drm_dev_set_unique);
>> >>
>> >> +/**
>> >> + * drm_dev_set_name - Set the name of a DRM device
>> >> + * @dev: device of which to set the name
>> >> + * @name: name to be set
>> >> + *
>> >> + * Return: 0 on success or a negative error code on failure.
>> >> + */
>> >> +int drm_dev_set_name(struct drm_device *dev, const char *name)
>> >> +{
>> >> +   kfree(dev->name);
>> >> +   dev->name = kstrdup(name, GFP_KERNEL);
>> >> +
>> >> +   return dev->name ? 0 : -ENOMEM;
>> >> +}
>> >> +EXPORT_SYMBOL(drm_dev_set_name);
>> >> +
>> >>  /*
>> >>   * DRM Core
>> >>   * The DRM core module initializes all global DRM objects and makes them
>> >> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
>> >> index 2263e3ddd822..61f02965106b 100644
>> >> --- a/drivers/gpu/drm/drm_ioctl.c
>> >> +++ b/drivers/gpu/drm/drm_ioctl.c
>> >> @@ -506,7 +506,7 @@ int drm_version(struct drm_device *dev, void *data,
>> >> dev->driver->date);
>> >> if (!err)
>> >> err = drm_copy_field(version->desc, &version->desc_len,
>> >> -   dev->driver->desc);
>> >> +   dev->name);
>>
>> I suspect this needs to be something like dev->name ? dev->name :
>> dev->driver->desc
>>
>> Or somewhere something needs to arrange for dev->name to default to
>> dev->driver->desc
>>
>> And maybe this should be dev->desc instead of dev->name.. that at
>> least seems less confusing to me.
>>
>> other than that, I don't see a big problem
>
> (recap from irc)
>
> I thought we're using this as essentially an uapi identifier, so that
> you know which kind of ioctl set a driver supports. Not so big deal on
> pci, where we match against pci ids anyway, kinda bigger deal where
> that's not around. Listing codenames and or something else that
> changes all the time feels a bit silly for that. Imo if you just want
> to expose this to userspace, stuff it into an amdgpu info/query ioctl.
>
> So what do you need this for exactly, where's the userspace that needs
> this?

Indeed; using this e.g. for changing userspace behaviour would seem
wrong.

BR,
Jani.


> -Daniel
>
>>
>> BR,
>> -R
>>
>> >>
>> >> return err;
>> >>  }
>> >> diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
>> >> index 7f9ef709b2b6..e29912c484e4 100644
>> >> --- a/include/drm/drm_device.h
>> >> +++ b/include/drm/drm_device.h
>> >> @@ -123,6 +123,9 @@ struct drm_device {
>> >> /** @unique: Unique name of the device */
>> >> char *unique;
>> >>
>> >> +   /** @name: device name */
>> >> +   char *name;
>> >> +
>> >> /**
>> >>  * @struct_mutex:
>> >>  *
>> >> diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
>> >> index 68ca736c548d..f742e2bde467 100644
>> >> --- a/include/drm/drm_drv

[PATCH] drm/amdgpu: Add SRIOV mailbox backend for Navi1x

2019-09-09 Thread jianzh
From: Jiange Zhao 

Mimic the ones for Vega10, add mailbox backend for Navi1x

Signed-off-by: Jiange Zhao 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   2 +-
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 380 ++
 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.h |  41 +++
 3 files changed, 422 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/mxgpu_nv.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 84614a71bb4d..43dc4aa18930 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -68,7 +68,7 @@ amdgpu-$(CONFIG_DRM_AMDGPU_SI)+= si.o gmc_v6_0.o gfx_v6_0.o 
si_ih.o si_dma.o dce
 amdgpu-y += \
vi.o mxgpu_vi.o nbio_v6_1.o soc15.o emu_soc.o mxgpu_ai.o nbio_v7_0.o 
vega10_reg_init.o \
vega20_reg_init.o nbio_v7_4.o nbio_v2_3.o nv.o navi10_reg_init.o 
navi14_reg_init.o \
-   arct_reg_init.o navi12_reg_init.o
+   arct_reg_init.o navi12_reg_init.o mxgpu_nv.o
 
 # add DF block
 amdgpu-y += \
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c 
b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
new file mode 100644
index ..0d8767eb7a70
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ -0,0 +1,380 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu.h"
+#include "nbio/nbio_2_3_offset.h"
+#include "nbio/nbio_2_3_sh_mask.h"
+#include "gc/gc_10_1_0_offset.h"
+#include "gc/gc_10_1_0_sh_mask.h"
+#include "soc15.h"
+#include "navi10_ih.h"
+#include "soc15_common.h"
+#include "mxgpu_nv.h"
+#include "mxgpu_ai.h"
+
+static void xgpu_nv_mailbox_send_ack(struct amdgpu_device *adev)
+{
+   WREG8(NV_MAIBOX_CONTROL_RCV_OFFSET_BYTE, 2);
+}
+
+static void xgpu_nv_mailbox_set_valid(struct amdgpu_device *adev, bool val)
+{
+   WREG8(NV_MAIBOX_CONTROL_TRN_OFFSET_BYTE, val ? 1 : 0);
+}
+
+/*
+ * this peek_msg could *only* be called in IRQ routine becuase in IRQ routine
+ * RCV_MSG_VALID filed of BIF_BX_PF_MAILBOX_CONTROL must already be set to 1
+ * by host.
+ *
+ * if called no in IRQ routine, this peek_msg cannot guaranteed to return the
+ * correct value since it doesn't return the RCV_DW0 under the case that
+ * RCV_MSG_VALID is set by host.
+ */
+static enum idh_event xgpu_nv_mailbox_peek_msg(struct amdgpu_device *adev)
+{
+   return RREG32_NO_KIQ(SOC15_REG_OFFSET(NBIO, 0,
+   mmBIF_BX_PF_MAILBOX_MSGBUF_RCV_DW0));
+}
+
+
+static int xgpu_nv_mailbox_rcv_msg(struct amdgpu_device *adev,
+  enum idh_event event)
+{
+   u32 reg;
+
+   reg = RREG32_NO_KIQ(SOC15_REG_OFFSET(NBIO, 0,
+
mmBIF_BX_PF_MAILBOX_MSGBUF_RCV_DW0));
+   if (reg != event)
+   return -ENOENT;
+
+   xgpu_nv_mailbox_send_ack(adev);
+
+   return 0;
+}
+
+static uint8_t xgpu_nv_peek_ack(struct amdgpu_device *adev)
+{
+   return RREG8(NV_MAIBOX_CONTROL_TRN_OFFSET_BYTE) & 2;
+}
+
+static int xgpu_nv_poll_ack(struct amdgpu_device *adev)
+{
+   int timeout  = NV_MAILBOX_POLL_ACK_TIMEDOUT;
+   u8 reg;
+
+   do {
+   reg = RREG8(NV_MAIBOX_CONTROL_TRN_OFFSET_BYTE);
+   if (reg & 2)
+   return 0;
+
+   mdelay(5);
+   timeout -= 5;
+   } while (timeout > 1);
+
+   pr_err("Doesn't get TRN_MSG_ACK from pf in %d msec\n", 
NV_MAILBOX_POLL_ACK_TIMEDOUT);
+
+   return -ETIME;
+}
+
+static int xgpu_nv_poll_msg(struct amdgpu_device *adev, enum idh_event event)
+{
+   int r, timeout = NV_MAILBOX_POLL_MSG_TIMEDOUT;
+
+   do {
+   r = xgpu_nv_mailbox_rcv_msg(adev, event);
+   if (!r)
+   return 0;
+
+   msleep(10);
+   timeout -= 10;
+   } while (timeout > 1);
+

RE: [PATCH] drm/amdgpu: add navi14 PCI ID for work station SKU

2019-09-09 Thread Xu, Feifei


Reviewed-by: Feifei Xu 

From: Yin, Tianci (Rico) 
Sent: Friday, September 6, 2019 3:37 PM
To: amd-gfx@lists.freedesktop.org
Cc: Xu, Feifei ; Zhang, Hawking ; 
Xiao, Jack ; Yuan, Xiaojie ; Long, 
Gang 
Subject: [PATCH] drm/amdgpu: add navi14 PCI ID for work station SKU


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: add navi14 PCI ID for work station SKU

2019-09-09 Thread Yin, Tianci (Rico)
Thanks Feifei!

From: Xu, Feifei 
Sent: Monday, September 9, 2019 18:43
To: Yin, Tianci (Rico) ; amd-gfx@lists.freedesktop.org 

Cc: Zhang, Hawking ; Xiao, Jack ; 
Yuan, Xiaojie ; Long, Gang 
Subject: RE: [PATCH] drm/amdgpu: add navi14 PCI ID for work station SKU






Reviewed-by: Feifei Xu 



From: Yin, Tianci (Rico) 
Sent: Friday, September 6, 2019 3:37 PM
To: amd-gfx@lists.freedesktop.org
Cc: Xu, Feifei ; Zhang, Hawking ; 
Xiao, Jack ; Yuan, Xiaojie ; Long, 
Gang 
Subject: [PATCH] drm/amdgpu: add navi14 PCI ID for work station SKU




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/1] drm/amdgpu: Disable retry faults in VMID0

2019-09-09 Thread Christian König

Am 05.09.19 um 01:31 schrieb Kuehling, Felix:

There is no point retrying page faults in VMID0. Those faults are
always fatal.

Signed-off-by: Felix Kuehling 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 2 ++
  drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c | 2 ++
  drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  | 2 ++
  drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c  | 2 ++
  drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c  | 2 ++
  5 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
index 6ce37ce77d14..9ec4297e61e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
@@ -178,6 +178,8 @@ static void gfxhub_v1_0_enable_system_domain(struct 
amdgpu_device *adev)
tmp = RREG32_SOC15(GC, 0, mmVM_CONTEXT0_CNTL);
tmp = REG_SET_FIELD(tmp, VM_CONTEXT0_CNTL, ENABLE_CONTEXT, 1);
tmp = REG_SET_FIELD(tmp, VM_CONTEXT0_CNTL, PAGE_TABLE_DEPTH, 0);
+   tmp = REG_SET_FIELD(tmp, VM_CONTEXT0_CNTL,
+   RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
WREG32_SOC15(GC, 0, mmVM_CONTEXT0_CNTL, tmp);
  }
  
diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c

index 8b789f750b72..a9238735d361 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v2_0.c
@@ -166,6 +166,8 @@ static void gfxhub_v2_0_enable_system_domain(struct 
amdgpu_device *adev)
tmp = RREG32_SOC15(GC, 0, mmGCVM_CONTEXT0_CNTL);
tmp = REG_SET_FIELD(tmp, GCVM_CONTEXT0_CNTL, ENABLE_CONTEXT, 1);
tmp = REG_SET_FIELD(tmp, GCVM_CONTEXT0_CNTL, PAGE_TABLE_DEPTH, 0);
+   tmp = REG_SET_FIELD(tmp, GCVM_CONTEXT0_CNTL,
+   RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
WREG32_SOC15(GC, 0, mmGCVM_CONTEXT0_CNTL, tmp);
  }
  
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c

index b9d6c0bfa594..4c7e8c64a94e 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
@@ -207,6 +207,8 @@ static void mmhub_v1_0_enable_system_domain(struct 
amdgpu_device *adev)
tmp = RREG32_SOC15(MMHUB, 0, mmVM_CONTEXT0_CNTL);
tmp = REG_SET_FIELD(tmp, VM_CONTEXT0_CNTL, ENABLE_CONTEXT, 1);
tmp = REG_SET_FIELD(tmp, VM_CONTEXT0_CNTL, PAGE_TABLE_DEPTH, 0);
+   tmp = REG_SET_FIELD(tmp, VM_CONTEXT0_CNTL,
+   RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
WREG32_SOC15(MMHUB, 0, mmVM_CONTEXT0_CNTL, tmp);
  }
  
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c

index 3542c203c3c8..86ed8cb915a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c
@@ -152,6 +152,8 @@ static void mmhub_v2_0_enable_system_domain(struct 
amdgpu_device *adev)
tmp = RREG32_SOC15(MMHUB, 0, mmMMVM_CONTEXT0_CNTL);
tmp = REG_SET_FIELD(tmp, MMVM_CONTEXT0_CNTL, ENABLE_CONTEXT, 1);
tmp = REG_SET_FIELD(tmp, MMVM_CONTEXT0_CNTL, PAGE_TABLE_DEPTH, 0);
+   tmp = REG_SET_FIELD(tmp, MMVM_CONTEXT0_CNTL,
+   RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
WREG32_SOC15(MMHUB, 0, mmMMVM_CONTEXT0_CNTL, tmp);
  }
  
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c

index 0cf7ef44b4b5..657970f9ebfb 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c
@@ -240,6 +240,8 @@ static void mmhub_v9_4_enable_system_domain(struct 
amdgpu_device *adev,
  hubid * MMHUB_INSTANCE_REGISTER_OFFSET);
tmp = REG_SET_FIELD(tmp, VML2VC0_VM_CONTEXT0_CNTL, ENABLE_CONTEXT, 1);
tmp = REG_SET_FIELD(tmp, VML2VC0_VM_CONTEXT0_CNTL, PAGE_TABLE_DEPTH, 0);
+   tmp = REG_SET_FIELD(tmp, VML2VC0_VM_CONTEXT0_CNTL,
+   RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
WREG32_SOC15_OFFSET(MMHUB, 0, mmVML2VC0_VM_CONTEXT0_CNTL,
hubid * MMHUB_INSTANCE_REGISTER_OFFSET, tmp);
  }


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/1] drm/amdgpu: Fix KFD-related kernel oops on Hawaii

2019-09-09 Thread Christian König

Am 06.09.19 um 01:29 schrieb Kuehling, Felix:

Hawaii needs to flush caches explicitly, submitting an IB in a user
VMID from kernel mode. There is no s_fence in this case.

Fixes: eb3961a57424 ("drm/amdgpu: remove fence context from the job")
Signed-off-by: Felix Kuehling 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 6882eeb93b4e..d81e141a33fa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -141,7 +141,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
/* ring tests don't use a job */
if (job) {
vm = job->vm;
-   fence_ctx = job->base.s_fence->scheduled.context;
+   fence_ctx = job->base.s_fence ?
+   job->base.s_fence->scheduled.context : 0;
} else {
vm = NULL;
fence_ctx = 0;


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/2] drm/amd/powerplay: issue DC-BTC for arcturus on SMU init

2019-09-09 Thread Quan, Evan
Need to perform DC-BTC for arcturus on bootup.

Change-Id: I7f048ba17cafe8909c5ee1e00830e4f8527d1a05
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c  |  4 ++--
 drivers/gpu/drm/amd/powerplay/arcturus_ppt.c| 17 -
 drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h  |  6 +++---
 .../gpu/drm/amd/powerplay/inc/arcturus_ppsmc.h  |  3 +--
 drivers/gpu/drm/amd/powerplay/inc/smu_types.h   |  1 +
 drivers/gpu/drm/amd/powerplay/vega20_ppt.c  |  2 +-
 6 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
index 2602d9fa2d77..f13e134be42e 100644
--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
@@ -1079,8 +1079,8 @@ static int smu_smc_table_hw_init(struct smu_context *smu,
if (ret)
return ret;
 
-   /* issue RunAfllBtc msg */
-   ret = smu_run_afll_btc(smu);
+   /* issue Run*Btc msg */
+   ret = smu_run_btc(smu);
if (ret)
return ret;
 
diff --git a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c 
b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
index 61cce5fed29f..7f6fc6d9a181 100644
--- a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
@@ -112,8 +112,7 @@ static struct smu_11_0_cmn2aisc_mapping 
arcturus_message_map[SMU_MSG_MAX_COUNT]
MSG_MAP(PrepareMp1ForShutdown,   
PPSMC_MSG_PrepareMp1ForShutdown),
MSG_MAP(SoftReset,   PPSMC_MSG_SoftReset),
MSG_MAP(RunAfllBtc,  PPSMC_MSG_RunAfllBtc),
-   MSG_MAP(RunGfxDcBtc, PPSMC_MSG_RunGfxDcBtc),
-   MSG_MAP(RunSocDcBtc, PPSMC_MSG_RunSocDcBtc),
+   MSG_MAP(RunDcBtc,PPSMC_MSG_RunDcBtc),
MSG_MAP(DramLogSetDramAddrHigh,  
PPSMC_MSG_DramLogSetDramAddrHigh),
MSG_MAP(DramLogSetDramAddrLow,   
PPSMC_MSG_DramLogSetDramAddrLow),
MSG_MAP(DramLogSetDramSize,  
PPSMC_MSG_DramLogSetDramSize),
@@ -547,9 +546,17 @@ static int arcturus_append_powerplay_table(struct 
smu_context *smu)
return 0;
 }
 
-static int arcturus_run_btc_afll(struct smu_context *smu)
+static int arcturus_run_btc(struct smu_context *smu)
 {
-   return smu_send_smc_msg(smu, SMU_MSG_RunAfllBtc);
+   int ret = 0;
+
+   ret = smu_send_smc_msg(smu, SMU_MSG_RunAfllBtc);
+   if (ret) {
+   pr_err("RunAfllBtc failed!\n");
+   return ret;
+   }
+
+   return smu_send_smc_msg(smu, SMU_MSG_RunDcBtc);
 }
 
 static int arcturus_populate_umd_state_clk(struct smu_context *smu)
@@ -2307,7 +2314,7 @@ static const struct pptable_funcs arcturus_ppt_funcs = {
/* init dpm */
.get_allowed_feature_mask = arcturus_get_allowed_feature_mask,
/* btc */
-   .run_afll_btc = arcturus_run_btc_afll,
+   .run_btc = arcturus_run_btc,
/* dpm/clk tables */
.set_default_dpm_table = arcturus_set_default_dpm_table,
.populate_umd_state_clk = arcturus_populate_umd_state_clk,
diff --git a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
index 3c69065b029e..3de88d084615 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
@@ -427,7 +427,7 @@ struct pptable_funcs {
int (*get_smu_table_index)(struct smu_context *smu, uint32_t index);
int (*get_smu_power_index)(struct smu_context *smu, uint32_t index);
int (*get_workload_type)(struct smu_context *smu, enum 
PP_SMC_POWER_PROFILE profile);
-   int (*run_afll_btc)(struct smu_context *smu);
+   int (*run_btc)(struct smu_context *smu);
int (*get_allowed_feature_mask)(struct smu_context *smu, uint32_t 
*feature_mask, uint32_t num);
enum amd_pm_state_type (*get_current_power_state)(struct smu_context 
*smu);
int (*set_default_dpm_table)(struct smu_context *smu);
@@ -745,8 +745,8 @@ struct smu_funcs
((smu)->ppt_funcs? ((smu)->ppt_funcs->get_smu_power_index? 
(smu)->ppt_funcs->get_smu_power_index((smu), (src)) : -EINVAL) : -EINVAL)
 #define smu_workload_get_type(smu, profile) \
((smu)->ppt_funcs? ((smu)->ppt_funcs->get_workload_type? 
(smu)->ppt_funcs->get_workload_type((smu), (profile)) : -EINVAL) : -EINVAL)
-#define smu_run_afll_btc(smu) \
-   ((smu)->ppt_funcs? ((smu)->ppt_funcs->run_afll_btc? 
(smu)->ppt_funcs->run_afll_btc((smu)) : 0) : 0)
+#define smu_run_btc(smu) \
+   ((smu)->ppt_funcs? ((smu)->ppt_funcs->run_btc? 
(smu)->ppt_funcs->run_btc((smu)) : 0) : 0)
 #define smu_get_allowed_feature_mask(smu, feature_mask, num) \
((smu)->ppt_funcs? ((smu)->ppt_funcs->get_allowed_feature_mask? 
(smu)->ppt_funcs->get_allowed_feature_mask((smu), (feature_mask), (num)) : 0) : 
0)
 #define smu_set_deep_

[PATCH 2/2] drm/amd/powerplay: update smu11_driver_if_arcturus.h

2019-09-09 Thread Quan, Evan
Also bump the SMU11_DRIVER_IF_VERSION_ARCT.

Change-Id: I786047d93bf4e1f0905069e2c742479740778fe6
Signed-off-by: Evan Quan 
---
 .../gpu/drm/amd/powerplay/inc/smu11_driver_if_arcturus.h| 6 +-
 drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h   | 2 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_arcturus.h 
b/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_arcturus.h
index e02950b505fa..40a51a141336 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_arcturus.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_arcturus.h
@@ -696,7 +696,11 @@ typedef struct {
   uint8_t  GpioI2cSda;  // Serial Data
   uint16_t GpioPadding;
 
-  uint32_t BoardReserved[9];
+  // Platform input telemetry voltage coefficient
+  uint32_t BoardVoltageCoeffA;// decode by /1000
+  uint32_t BoardVoltageCoeffB;// decode by /1000
+
+  uint32_t BoardReserved[7];
 
   // Padding for MMHUB - do not modify this
   uint32_t MmHubPadding[8]; // SMU internal use
diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h 
b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h
index b1e370e19d22..3b9e3a277ded 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h
@@ -27,7 +27,7 @@
 
 #define SMU11_DRIVER_IF_VERSION_INV 0x
 #define SMU11_DRIVER_IF_VERSION_VG20 0x13
-#define SMU11_DRIVER_IF_VERSION_ARCT 0x09
+#define SMU11_DRIVER_IF_VERSION_ARCT 0x0A
 #define SMU11_DRIVER_IF_VERSION_NV10 0x33
 #define SMU11_DRIVER_IF_VERSION_NV12 0x33
 #define SMU11_DRIVER_IF_VERSION_NV14 0x34
-- 
2.23.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Allow to reset to EERPOM table.

2019-09-09 Thread Christian König

We should actually try to get rid of the ras_ctl debugfs in the long term.

Overloading that with different functionality was a really bad idea in 
the first place.


Christian.

Am 07.09.19 um 19:50 schrieb Grodzovsky, Andrey:

What about adding new value to existing ras_ctl debugfs file ?

Andrey


From: Alex Deucher 
Sent: 07 September 2019 09:42:47
To: Grodzovsky, Andrey
Cc: amd-gfx list; Zhou1, Tao; Chen, Guchun
Subject: Re: [PATCH] drm/amdgpu: Allow to reset to EERPOM table.

On Fri, Sep 6, 2019 at 11:13 AM Andrey Grodzovsky
 wrote:

The table grows quickly during debug/development effort when
multiple RAS errors are injected. Allow to avoid this by setting
table header back to empty if needed.


Please make this a debugfs file rather than a module parameter so that
it an be updated at runtime and more easily handled on a per card
basis.

Alex


Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 8 
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 3 ++-
  3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 0d11aa8..405c55a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -169,6 +169,7 @@ extern int amdgpu_discovery;
  extern int amdgpu_mes;
  extern int amdgpu_noretry;
  extern int amdgpu_force_asic_type;
+extern int amdgpu_ras_eeprom_reset;

  #ifdef CONFIG_DRM_AMDGPU_SI
  extern int amdgpu_si_support;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 5a7f929..6e101a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -145,6 +145,7 @@ int amdgpu_discovery = -1;
  int amdgpu_mes = 0;
  int amdgpu_noretry = 1;
  int amdgpu_force_asic_type = -1;
+int amdgpu_ras_eeprom_reset = -1;

  struct amdgpu_mgpu_info mgpu_info = {
 .mutex = __MUTEX_INITIALIZER(mgpu_info.mutex),
@@ -530,6 +531,13 @@ MODULE_PARM_DESC(ras_mask, "Mask of RAS features to enable 
(default 0x),
  module_param_named(ras_mask, amdgpu_ras_mask, uint, 0444);

  /**
+ * DOC: ras_eeprom_reset (int)
+ * Reset EEPROM table to zerro entries.
+ */
+MODULE_PARM_DESC(ras_eeprom_reset, "Reset RAS EEPROM table to zerro entries (1 = 
reset, -1 = auto (default - don't reset)");
+module_param_named(ras_eeprom_reset, amdgpu_ras_eeprom_reset, int, 0444);
+
+/**
   * DOC: si_support (int)
   * Set SI support driver. This parameter works after set config 
CONFIG_DRM_AMDGPU_SI. For SI asic, when radeon driver is enabled,
   * set value 0 to use radeon driver, while set value 1 to use amdgpu driver. 
The default is using radeon driver when it available,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index 43dd4ab..75c6fc0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -140,7 +140,8 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control 
*control)

 __decode_table_header_from_buff(hdr, &buff[2]);

-   if (hdr->header == EEPROM_TABLE_HDR_VAL) {
+   if (amdgpu_ras_eeprom_reset != 1 &&
+   hdr->header == EEPROM_TABLE_HDR_VAL) {
 control->num_recs = (hdr->tbl_size - EEPROM_TABLE_HEADER_SIZE) 
/
 EEPROM_TABLE_RECORD_SIZE;
 DRM_DEBUG_DRIVER("Found existing EEPROM table with %d records",
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: grab the id mgr lock while accessing passid_mapping

2019-09-09 Thread Christian König
Need to make sure that we actually dropping the right fence.
Could be done with RCU as well, but to complicated for a fix.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index b285ab25146d..e11764164cbf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1036,10 +1036,8 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
amdgpu_job *job, bool need_
id->oa_base != job->oa_base ||
id->oa_size != job->oa_size);
bool vm_flush_needed = job->vm_needs_flush;
-   bool pasid_mapping_needed = id->pasid != job->pasid ||
-   !id->pasid_mapping ||
-   !dma_fence_is_signaled(id->pasid_mapping);
struct dma_fence *fence = NULL;
+   bool pasid_mapping_needed;
unsigned patch_offset = 0;
int r;
 
@@ -1049,6 +1047,12 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
amdgpu_job *job, bool need_
pasid_mapping_needed = true;
}
 
+   mutex_lock(&id_mgr->lock);
+   if (id->pasid != job->pasid || !id->pasid_mapping ||
+   !dma_fence_is_signaled(id->pasid_mapping))
+   pasid_mapping_needed = true;
+   mutex_unlock(&id_mgr->lock);
+
gds_switch_needed &= !!ring->funcs->emit_gds_switch;
vm_flush_needed &= !!ring->funcs->emit_vm_flush  &&
job->vm_pd_addr != AMDGPU_BO_INVALID_OFFSET;
@@ -1088,9 +1092,11 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
amdgpu_job *job, bool need_
}
 
if (pasid_mapping_needed) {
+   mutex_lock(&id_mgr->lock);
id->pasid = job->pasid;
dma_fence_put(id->pasid_mapping);
id->pasid_mapping = dma_fence_get(fence);
+   mutex_unlock(&id_mgr->lock);
}
dma_fence_put(fence);
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2

2019-09-09 Thread Christian König

Am 04.09.19 um 22:12 schrieb Yang, Philip:

This series looks nice and clear for me, two questions embedded below.

Are we going to use dedicated sdma page queue for direct VM update path
during a fault?

Thanks,
Philip

On 2019-09-04 11:02 a.m., Christian König wrote:

Next step towards HMM support. For now just silence the retry fault and
optionally redirect the request to the dummy page.

v2: make sure the VM is not destroyed while we handle the fault.

Signed-off-by: Christian König 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 ++
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
   3 files changed, 80 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 951608fc1925..410d89966a66 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
}
}
   }
+
+/**
+ * amdgpu_vm_handle_fault - graceful handling of VM faults.
+ * @adev: amdgpu device pointer
+ * @pasid: PASID of the VM
+ * @addr: Address of the fault
+ *
+ * Try to gracefully handle a VM fault. Return true if the fault was handled 
and
+ * shouldn't be reported any more.
+ */
+bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
+   uint64_t addr)
+{
+   struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
+   struct amdgpu_bo *root;
+   uint64_t value, flags;
+   struct amdgpu_vm *vm;
+   long r;
+
+   if (!ring->sched.ready)
+   return false;
+
+   spin_lock(&adev->vm_manager.pasid_lock);
+   vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
+   if (vm)
+   root = amdgpu_bo_ref(vm->root.base.bo);
+   else
+   root = NULL;
+   spin_unlock(&adev->vm_manager.pasid_lock);
+
+   if (!root)
+   return false;
+
+   r = amdgpu_bo_reserve(root, true);
+   if (r)
+   goto error_unref;
+
+   spin_lock(&adev->vm_manager.pasid_lock);
+   vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
+   spin_unlock(&adev->vm_manager.pasid_lock);
+

Here get vm from pasid second time, and check if PD bo is changed, is
this to handle vm fault race with vm destory?


Yes, exactly.




+   if (!vm || vm->root.base.bo != root)
+   goto error_unlock;
+
+   addr /= AMDGPU_GPU_PAGE_SIZE;
+   flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
+   AMDGPU_PTE_SYSTEM;
+
+   if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
+   /* Redirect the access to the dummy page */
+   value = adev->dummy_page_addr;
+   flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
+   AMDGPU_PTE_WRITEABLE;
+   } else {
+   value = 0;
+   }
+
+   r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
+   flags, value, NULL, NULL);
+   if (r)
+   goto error_unlock;
+

After fault address redirect to dummy page, will the fault recover and
retry continue to execute?


Yes, the read/write operation will just retry and use the value from the 
dummy page instead.



Is this dangerous to update PTE to use system
memory address 0?


What are you talking about? The dummy page is a page allocate by TTM 
where we redirect faulty accesses to.


Regards,
Christian.




+   r = amdgpu_vm_update_pdes(adev, vm, true);
+
+error_unlock:
+   amdgpu_bo_unreserve(root);
+   if (r < 0)
+   DRM_ERROR("Can't handle page fault (%ld)\n", r);
+
+error_unref:
+   amdgpu_bo_unref(&root);
+
+   return false;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 0a97dc839f3b..4dbbe1b6b413 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct amdgpu_device 
*adev);
   
   void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,

 struct amdgpu_task_info *task_info);
+bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
+   uint64_t addr);
   
   void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
   
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c

index 9d15679df6e0..15a1ce51befa 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device 
*adev,
}
   
   	/* If it's the first fault for this address, process it normally */

+   if (retry_fault && !in_interrupt() &&
+   amdgpu_vm_handle_fault(adev, entry->pasid, addr))
+   return 1; /*

Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2

2019-09-09 Thread Christian König

Am 05.09.19 um 00:47 schrieb Kuehling, Felix:

On 2019-09-04 11:02 a.m., Christian König wrote:

Next step towards HMM support. For now just silence the retry fault and
optionally redirect the request to the dummy page.

v2: make sure the VM is not destroyed while we handle the fault.

Signed-off-by: Christian König 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 ++
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
   3 files changed, 80 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 951608fc1925..410d89966a66 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
}
}
   }
+
+/**
+ * amdgpu_vm_handle_fault - graceful handling of VM faults.
+ * @adev: amdgpu device pointer
+ * @pasid: PASID of the VM
+ * @addr: Address of the fault
+ *
+ * Try to gracefully handle a VM fault. Return true if the fault was handled 
and
+ * shouldn't be reported any more.
+ */
+bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
+   uint64_t addr)
+{
+   struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
+   struct amdgpu_bo *root;
+   uint64_t value, flags;
+   struct amdgpu_vm *vm;
+   long r;
+
+   if (!ring->sched.ready)
+   return false;
+
+   spin_lock(&adev->vm_manager.pasid_lock);
+   vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
+   if (vm)
+   root = amdgpu_bo_ref(vm->root.base.bo);
+   else
+   root = NULL;
+   spin_unlock(&adev->vm_manager.pasid_lock);
+
+   if (!root)
+   return false;
+
+   r = amdgpu_bo_reserve(root, true);
+   if (r)
+   goto error_unref;
+
+   spin_lock(&adev->vm_manager.pasid_lock);
+   vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
+   spin_unlock(&adev->vm_manager.pasid_lock);

I think this deserves a comment. If I understand it correctly, you're
looking up the vm twice so that you have the VM root reservation to
protect against user-after-free. Otherwise the vm pointer is only valid
as long as you're holding the spin-lock.



+
+   if (!vm || vm->root.base.bo != root)

The check of vm->root.base.bo should probably still be under the
spin_lock. Because you're not sure yet it's the right VM, you can't rely
on the reservation here to prevent use-after-free.


Good point, going to fix that.





+   goto error_unlock;
+
+   addr /= AMDGPU_GPU_PAGE_SIZE;
+   flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
+   AMDGPU_PTE_SYSTEM;
+
+   if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
+   /* Redirect the access to the dummy page */
+   value = adev->dummy_page_addr;
+   flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
+   AMDGPU_PTE_WRITEABLE;
+   } else {
+   value = 0;
+   }
+
+   r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
+   flags, value, NULL, NULL);
+   if (r)
+   goto error_unlock;
+
+   r = amdgpu_vm_update_pdes(adev, vm, true);
+
+error_unlock:
+   amdgpu_bo_unreserve(root);
+   if (r < 0)
+   DRM_ERROR("Can't handle page fault (%ld)\n", r);
+
+error_unref:
+   amdgpu_bo_unref(&root);
+
+   return false;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 0a97dc839f3b..4dbbe1b6b413 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct amdgpu_device 
*adev);
   
   void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,

 struct amdgpu_task_info *task_info);
+bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
+   uint64_t addr);
   
   void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
   
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c

index 9d15679df6e0..15a1ce51befa 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -353,6 +353,10 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device 
*adev,
}
   
   	/* If it's the first fault for this address, process it normally */

+   if (retry_fault && !in_interrupt() &&
+   amdgpu_vm_handle_fault(adev, entry->pasid, addr))
+   return 1; /* This also prevents sending it to KFD */

The !in_interrupt() is meant to only do this on the rerouted interrupt
ring that's handled by a worker function?


Yes, exactly. But I plan to add a workaround where the CPU redirects 

Re: Graceful page fault handling for Vega/Navi

2019-09-09 Thread Christian König

Am 05.09.19 um 00:52 schrieb Kuehling, Felix:

On 2019-09-04 11:02 a.m., Christian König wrote:

Hi everyone,

this series is the next puzzle piece for recoverable page fault handling on 
Vega and Navi.

It adds a new direct scheduler entity for VM updates which is then used to 
update page tables during a fault.

In other words previously an application doing an invalid memory access would 
just hang and/or repeat the invalid access over and over again. Now the 
handling is modified so that the invalid memory access is redirected to the 
dummy page.

This needs the following prerequisites:
a) The firmware must be new enough so allow re-routing of page faults.
b) Fault retry must be enabled using the amdgpu.noretry=0 parameter.
c) Enough free VRAM to allocate page tables to point to the dummy page.

The re-routing of page faults current only works on Vega10, so Vega20 and Navi 
will still need some more time.

Wait, we don't do the page fault rerouting on Vega20 yet? So we're
getting the full brunt of the fault storm on the main interrupt ring?


It's implemented, but the Vega20 firmware fails to enable the 
re-reouting for some reason.


I haven't had time yet to talk to the firmware guys why that happens.


In that case, we should probably change the default setting of
amdgpu.noretry=1 at least until that's done.

Other than that the patch series looks reasonable to me. I commented on
patches 4 and 9 separately.

Patch 1 is Acked-by: Felix Kuehling 

With the issues addressed that I pointed out, the rest is

Reviewed-by: Felix Kuehling 


Thanks,
Christian.



Regards,
    Felix



Please review and/or comment,
Christian.


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amd/powerplay: issue DC-BTC for arcturus on SMU init

2019-09-09 Thread Wang, Kevin(Yang)
Reviewed-by: Kevin Wang 

From: amd-gfx  on behalf of Quan, Evan 

Sent: Monday, September 9, 2019 7:33 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Li, Candice ; Gui, Jack ; Quan, Evan 

Subject: [PATCH 1/2] drm/amd/powerplay: issue DC-BTC for arcturus on SMU init

Need to perform DC-BTC for arcturus on bootup.

Change-Id: I7f048ba17cafe8909c5ee1e00830e4f8527d1a05
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c  |  4 ++--
 drivers/gpu/drm/amd/powerplay/arcturus_ppt.c| 17 -
 drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h  |  6 +++---
 .../gpu/drm/amd/powerplay/inc/arcturus_ppsmc.h  |  3 +--
 drivers/gpu/drm/amd/powerplay/inc/smu_types.h   |  1 +
 drivers/gpu/drm/amd/powerplay/vega20_ppt.c  |  2 +-
 6 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
index 2602d9fa2d77..f13e134be42e 100644
--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
@@ -1079,8 +1079,8 @@ static int smu_smc_table_hw_init(struct smu_context *smu,
 if (ret)
 return ret;

-   /* issue RunAfllBtc msg */
-   ret = smu_run_afll_btc(smu);
+   /* issue Run*Btc msg */
+   ret = smu_run_btc(smu);
 if (ret)
 return ret;

diff --git a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c 
b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
index 61cce5fed29f..7f6fc6d9a181 100644
--- a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
@@ -112,8 +112,7 @@ static struct smu_11_0_cmn2aisc_mapping 
arcturus_message_map[SMU_MSG_MAX_COUNT]
 MSG_MAP(PrepareMp1ForShutdown,   
PPSMC_MSG_PrepareMp1ForShutdown),
 MSG_MAP(SoftReset,   PPSMC_MSG_SoftReset),
 MSG_MAP(RunAfllBtc,  PPSMC_MSG_RunAfllBtc),
-   MSG_MAP(RunGfxDcBtc, PPSMC_MSG_RunGfxDcBtc),
-   MSG_MAP(RunSocDcBtc, PPSMC_MSG_RunSocDcBtc),
+   MSG_MAP(RunDcBtc,PPSMC_MSG_RunDcBtc),
 MSG_MAP(DramLogSetDramAddrHigh,  
PPSMC_MSG_DramLogSetDramAddrHigh),
 MSG_MAP(DramLogSetDramAddrLow,   
PPSMC_MSG_DramLogSetDramAddrLow),
 MSG_MAP(DramLogSetDramSize,  
PPSMC_MSG_DramLogSetDramSize),
@@ -547,9 +546,17 @@ static int arcturus_append_powerplay_table(struct 
smu_context *smu)
 return 0;
 }

-static int arcturus_run_btc_afll(struct smu_context *smu)
+static int arcturus_run_btc(struct smu_context *smu)
 {
-   return smu_send_smc_msg(smu, SMU_MSG_RunAfllBtc);
+   int ret = 0;
+
+   ret = smu_send_smc_msg(smu, SMU_MSG_RunAfllBtc);
+   if (ret) {
+   pr_err("RunAfllBtc failed!\n");
+   return ret;
+   }
+
+   return smu_send_smc_msg(smu, SMU_MSG_RunDcBtc);
 }

 static int arcturus_populate_umd_state_clk(struct smu_context *smu)
@@ -2307,7 +2314,7 @@ static const struct pptable_funcs arcturus_ppt_funcs = {
 /* init dpm */
 .get_allowed_feature_mask = arcturus_get_allowed_feature_mask,
 /* btc */
-   .run_afll_btc = arcturus_run_btc_afll,
+   .run_btc = arcturus_run_btc,
 /* dpm/clk tables */
 .set_default_dpm_table = arcturus_set_default_dpm_table,
 .populate_umd_state_clk = arcturus_populate_umd_state_clk,
diff --git a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
index 3c69065b029e..3de88d084615 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
@@ -427,7 +427,7 @@ struct pptable_funcs {
 int (*get_smu_table_index)(struct smu_context *smu, uint32_t index);
 int (*get_smu_power_index)(struct smu_context *smu, uint32_t index);
 int (*get_workload_type)(struct smu_context *smu, enum 
PP_SMC_POWER_PROFILE profile);
-   int (*run_afll_btc)(struct smu_context *smu);
+   int (*run_btc)(struct smu_context *smu);
 int (*get_allowed_feature_mask)(struct smu_context *smu, uint32_t 
*feature_mask, uint32_t num);
 enum amd_pm_state_type (*get_current_power_state)(struct smu_context 
*smu);
 int (*set_default_dpm_table)(struct smu_context *smu);
@@ -745,8 +745,8 @@ struct smu_funcs
 ((smu)->ppt_funcs? ((smu)->ppt_funcs->get_smu_power_index? 
(smu)->ppt_funcs->get_smu_power_index((smu), (src)) : -EINVAL) : -EINVAL)
 #define smu_workload_get_type(smu, profile) \
 ((smu)->ppt_funcs? ((smu)->ppt_funcs->get_workload_type? 
(smu)->ppt_funcs->get_workload_type((smu), (profile)) : -EINVAL) : -EINVAL)
-#define smu_run_afll_btc(smu) \
-   ((smu)->ppt_funcs? ((smu)->ppt_funcs->run_afll_btc? 
(smu)->ppt_funcs->run_afll_btc((smu)) : 0) : 0)
+#define smu_run_btc(smu) \
+   ((smu)->ppt_funcs

Re: [PATCH 2/2] drm/amd/powerplay: update smu11_driver_if_arcturus.h

2019-09-09 Thread Wang, Kevin(Yang)
Reviewed-by: Kevin Wang 


From: amd-gfx  on behalf of Quan, Evan 

Sent: Monday, September 9, 2019 7:33 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Li, Candice ; Gui, Jack ; Quan, Evan 

Subject: [PATCH 2/2] drm/amd/powerplay: update smu11_driver_if_arcturus.h

Also bump the SMU11_DRIVER_IF_VERSION_ARCT.

Change-Id: I786047d93bf4e1f0905069e2c742479740778fe6
Signed-off-by: Evan Quan 
---
 .../gpu/drm/amd/powerplay/inc/smu11_driver_if_arcturus.h| 6 +-
 drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h   | 2 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_arcturus.h 
b/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_arcturus.h
index e02950b505fa..40a51a141336 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_arcturus.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_arcturus.h
@@ -696,7 +696,11 @@ typedef struct {
   uint8_t  GpioI2cSda;  // Serial Data
   uint16_t GpioPadding;

-  uint32_t BoardReserved[9];
+  // Platform input telemetry voltage coefficient
+  uint32_t BoardVoltageCoeffA;// decode by /1000
+  uint32_t BoardVoltageCoeffB;// decode by /1000
+
+  uint32_t BoardReserved[7];

   // Padding for MMHUB - do not modify this
   uint32_t MmHubPadding[8]; // SMU internal use
diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h 
b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h
index b1e370e19d22..3b9e3a277ded 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h
@@ -27,7 +27,7 @@

 #define SMU11_DRIVER_IF_VERSION_INV 0x
 #define SMU11_DRIVER_IF_VERSION_VG20 0x13
-#define SMU11_DRIVER_IF_VERSION_ARCT 0x09
+#define SMU11_DRIVER_IF_VERSION_ARCT 0x0A
 #define SMU11_DRIVER_IF_VERSION_NV10 0x33
 #define SMU11_DRIVER_IF_VERSION_NV12 0x33
 #define SMU11_DRIVER_IF_VERSION_NV14 0x34
--
2.23.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amd/powerplay: Add the interface for geting dpm current power state

2019-09-09 Thread Wang, Kevin(Yang)
comment inline.

From: amd-gfx  on behalf of Quan, Evan 

Sent: Monday, September 9, 2019 4:31 PM
To: Liang, Prike ; amd-gfx@lists.freedesktop.org 

Cc: Huang, Ray ; Feng, Kenneth ; Liu, 
Aaron 
Subject: RE: [PATCH] drm/amd/powerplay: Add the interface for geting dpm 
current power state

Reviewed-by: Evan Quan 

> -Original Message-
> From: Liang, Prike 
> Sent: 2019年9月9日 13:22
> To: amd-gfx@lists.freedesktop.org
> Cc: Quan, Evan ; Feng, Kenneth
> ; Huang, Ray ; Liu, Aaron
> ; Liang, Prike 
> Subject: [PATCH] drm/amd/powerplay: Add the interface for geting dpm
> current power state
>
> implement the sysfs power_dpm_state
>
> Signed-off-by: Prike Liang 
> ---
>  drivers/gpu/drm/amd/powerplay/renoir_ppt.c | 34
> ++
>  1 file changed, 34 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> b/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> index a5cf846..2c22ba4 100644
> --- a/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> +++ b/drivers/gpu/drm/amd/powerplay/renoir_ppt.c
> @@ -246,6 +246,38 @@ static int renoir_print_clk_levels(struct smu_context
> *smu,
>return size;
>  }
>
> +static enum amd_pm_state_type renoir_get_current_power_state(struct
> +smu_context *smu) {
> + enum amd_pm_state_type pm_type;
> + struct smu_dpm_context *smu_dpm_ctx = &(smu->smu_dpm);
> +
> + if (!smu_dpm_ctx->dpm_context ||
> + !smu_dpm_ctx->dpm_current_power_state)
> + return -EINVAL;
> +
> + mutex_lock(&(smu->mutex));
> + switch (smu_dpm_ctx->dpm_current_power_state-
> >classification.ui_label) {
> + case SMU_STATE_UI_LABEL_BATTERY:
> + pm_type = POWER_STATE_TYPE_BATTERY;
> + break;
> + case SMU_STATE_UI_LABEL_BALLANCED:
> + pm_type = POWER_STATE_TYPE_BALANCED;
> + break;
> + case SMU_STATE_UI_LABEL_PERFORMANCE:
> + pm_type = POWER_STATE_TYPE_PERFORMANCE;
> + break;
> + default:
> + if (smu_dpm_ctx->dpm_current_power_state-
> >classification.flags & SMU_STATE_CLASSIFICATION_FLAG_BOOT)
> + pm_type = POWER_STATE_TYPE_INTERNAL_BOOT;
> + else
> + pm_type = POWER_STATE_TYPE_DEFAULT;
> + break;
> + }
> + mutex_unlock(&(smu->mutex));
> +
> + return pm_type;
> +}
> +
>  static const struct pptable_funcs renoir_ppt_funcs = {
>.get_smu_msg_index = renoir_get_smu_msg_index,
>.get_smu_table_index = renoir_get_smu_table_index, @@ -253,6
> +285,8 @@ static const struct pptable_funcs renoir_ppt_funcs = {
>.set_power_state = NULL,
>.get_dpm_uclk_limited = renoir_get_dpm_uclk_limited,
>.print_clk_levels = renoir_print_clk_levels,
> + .get_current_power_state = renoir_get_current_power_state,
> +
[kevin]:
please remove this blank line.
after fixed:
Reviewed-by: Kevin wang 
>  };
>
>  void renoir_set_ppt_funcs(struct smu_context *smu)
> --
> 2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2

2019-09-09 Thread Yang, Philip


On 2019-09-09 8:03 a.m., Christian König wrote:
> Am 04.09.19 um 22:12 schrieb Yang, Philip:
>> This series looks nice and clear for me, two questions embedded below.
>>
>> Are we going to use dedicated sdma page queue for direct VM update path
>> during a fault?
>>
>> Thanks,
>> Philip
>>
>> On 2019-09-04 11:02 a.m., Christian König wrote:
>>> Next step towards HMM support. For now just silence the retry fault and
>>> optionally redirect the request to the dummy page.
>>>
>>> v2: make sure the VM is not destroyed while we handle the fault.
>>>
>>> Signed-off-by: Christian König 
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 
>>> ++
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>>>    drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>>>    3 files changed, 80 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 951608fc1925..410d89966a66 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm 
>>> *vm)
>>>    }
>>>    }
>>>    }
>>> +
>>> +/**
>>> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
>>> + * @adev: amdgpu device pointer
>>> + * @pasid: PASID of the VM
>>> + * @addr: Address of the fault
>>> + *
>>> + * Try to gracefully handle a VM fault. Return true if the fault was 
>>> handled and
>>> + * shouldn't be reported any more.
>>> + */
>>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int 
>>> pasid,
>>> +    uint64_t addr)
>>> +{
>>> +    struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
>>> +    struct amdgpu_bo *root;
>>> +    uint64_t value, flags;
>>> +    struct amdgpu_vm *vm;
>>> +    long r;
>>> +
>>> +    if (!ring->sched.ready)
>>> +    return false;
>>> +
>>> +    spin_lock(&adev->vm_manager.pasid_lock);
>>> +    vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>> +    if (vm)
>>> +    root = amdgpu_bo_ref(vm->root.base.bo);
>>> +    else
>>> +    root = NULL;
>>> +    spin_unlock(&adev->vm_manager.pasid_lock);
>>> +
>>> +    if (!root)
>>> +    return false;
>>> +
>>> +    r = amdgpu_bo_reserve(root, true);
>>> +    if (r)
>>> +    goto error_unref;
>>> +
>>> +    spin_lock(&adev->vm_manager.pasid_lock);
>>> +    vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>> +    spin_unlock(&adev->vm_manager.pasid_lock);
>>> +
>> Here get vm from pasid second time, and check if PD bo is changed, is
>> this to handle vm fault race with vm destory?
> 
> Yes, exactly.
> 
>>
>>> +    if (!vm || vm->root.base.bo != root)
>>> +    goto error_unlock;
>>> +
>>> +    addr /= AMDGPU_GPU_PAGE_SIZE;
>>> +    flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>>> +    AMDGPU_PTE_SYSTEM;
>>> +
>>> +    if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>>> +    /* Redirect the access to the dummy page */
>>> +    value = adev->dummy_page_addr;
>>> +    flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>>> +    AMDGPU_PTE_WRITEABLE;
>>> +    } else {
>>> +    value = 0;
>>> +    }
>>> +
>>> +    r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr 
>>> + 1,
>>> +    flags, value, NULL, NULL);
>>> +    if (r)
>>> +    goto error_unlock;
>>> +
>> After fault address redirect to dummy page, will the fault recover and
>> retry continue to execute?
> 
> Yes, the read/write operation will just retry and use the value from the 
> dummy page instead.
> 
>> Is this dangerous to update PTE to use system
>> memory address 0?
> 
> What are you talking about? The dummy page is a page allocate by TTM 
> where we redirect faulty accesses to.
> 
For amdgpu_vm_fault_stop equals to AMDGPU_VM_FAULT_STOP_FIRST/ALWAYS 
case, value is 0, this will redirect to system memory 0. Maybe redirect 
is only needed for AMDGPU_VM_FAULT_STOP_NEVER?

Regards,
Philip

> Regards,
> Christian.
> 
>>
>>> +    r = amdgpu_vm_update_pdes(adev, vm, true);
>>> +
>>> +error_unlock:
>>> +    amdgpu_bo_unreserve(root);
>>> +    if (r < 0)
>>> +    DRM_ERROR("Can't handle page fault (%ld)\n", r);
>>> +
>>> +error_unref:
>>> +    amdgpu_bo_unref(&root);
>>> +
>>> +    return false;
>>> +}
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> index 0a97dc839f3b..4dbbe1b6b413 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct 
>>> amdgpu_device *adev);
>>>    void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned 
>>> int pasid,
>>>     struct amdgpu_task_info *task_info);
>>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int 
>>> pasid,
>>> +    uint64_t addr);
>>>    void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
>>> diff --git

RE: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2

2019-09-09 Thread Zeng, Oak
Is looking up vm twice necessary? I think we are in interrupt context, is it 
possible that the user space application can be switched in between? My 
understanding is, if user space application is can't kick in during interrupt 
handling, application shouldn't have chance to exit (then their vm being 
destroyed).

Regards,
Oak

-Original Message-
From: amd-gfx  On Behalf Of Christian 
König
Sent: Monday, September 9, 2019 8:08 AM
To: Kuehling, Felix ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2

Am 05.09.19 um 00:47 schrieb Kuehling, Felix:
> On 2019-09-04 11:02 a.m., Christian König wrote:
>> Next step towards HMM support. For now just silence the retry fault 
>> and optionally redirect the request to the dummy page.
>>
>> v2: make sure the VM is not destroyed while we handle the fault.
>>
>> Signed-off-by: Christian König 
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 ++
>>drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>>drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>>3 files changed, 80 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index 951608fc1925..410d89966a66 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
>>  }
>>  }
>>}
>> +
>> +/**
>> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
>> + * @adev: amdgpu device pointer
>> + * @pasid: PASID of the VM
>> + * @addr: Address of the fault
>> + *
>> + * Try to gracefully handle a VM fault. Return true if the fault was 
>> +handled and
>> + * shouldn't be reported any more.
>> + */
>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> +uint64_t addr)
>> +{
>> +struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
>> +struct amdgpu_bo *root;
>> +uint64_t value, flags;
>> +struct amdgpu_vm *vm;
>> +long r;
>> +
>> +if (!ring->sched.ready)
>> +return false;
>> +
>> +spin_lock(&adev->vm_manager.pasid_lock);
>> +vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>> +if (vm)
>> +root = amdgpu_bo_ref(vm->root.base.bo);
>> +else
>> +root = NULL;
>> +spin_unlock(&adev->vm_manager.pasid_lock);
>> +
>> +if (!root)
>> +return false;
>> +
>> +r = amdgpu_bo_reserve(root, true);
>> +if (r)
>> +goto error_unref;
>> +
>> +spin_lock(&adev->vm_manager.pasid_lock);
>> +vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>> +spin_unlock(&adev->vm_manager.pasid_lock);
> I think this deserves a comment. If I understand it correctly, you're 
> looking up the vm twice so that you have the VM root reservation to 
> protect against user-after-free. Otherwise the vm pointer is only 
> valid as long as you're holding the spin-lock.
>
>
>> +
>> +if (!vm || vm->root.base.bo != root)
> The check of vm->root.base.bo should probably still be under the 
> spin_lock. Because you're not sure yet it's the right VM, you can't 
> rely on the reservation here to prevent use-after-free.

Good point, going to fix that.

>
>
>> +goto error_unlock;
>> +
>> +addr /= AMDGPU_GPU_PAGE_SIZE;
>> +flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>> +AMDGPU_PTE_SYSTEM;
>> +
>> +if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>> +/* Redirect the access to the dummy page */
>> +value = adev->dummy_page_addr;
>> +flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>> +AMDGPU_PTE_WRITEABLE;
>> +} else {
>> +value = 0;
>> +}
>> +
>> +r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
>> +flags, value, NULL, NULL);
>> +if (r)
>> +goto error_unlock;
>> +
>> +r = amdgpu_vm_update_pdes(adev, vm, true);
>> +
>> +error_unlock:
>> +amdgpu_bo_unreserve(root);
>> +if (r < 0)
>> +DRM_ERROR("Can't handle page fault (%ld)\n", r);
>> +
>> +error_unref:
>> +amdgpu_bo_unref(&root);
>> +
>> +return false;
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> index 0a97dc839f3b..4dbbe1b6b413 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>> @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct 
>> amdgpu_device *adev);
>>
>>void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int 
>> pasid,
>>   struct amdgpu_task_info *task_info);
>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>> +uint64_t addr);
>>
>>void amdgpu_vm_set_task_info(struct amdgpu_vm *v

[PATCH 0/2] Enable Dali for DC

2019-09-09 Thread Bhawanpreet Lakha
Dali is a new asic based on raven. This patch adds the asic ID and
support for it in the display core.

Bhawanpreet Lakha (2):
  drm/amd/display: add Asic ID for Dali
  drm/amd/display: Implement voltage limitation for dali

 drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c  | 4 
 drivers/gpu/drm/amd/display/include/dal_asic_id.h | 7 +--
 2 files changed, 9 insertions(+), 2 deletions(-)

-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/2] drm/amd/display: add Asic ID for Dali

2019-09-09 Thread Bhawanpreet Lakha
Dali is a new asic revision based on raven2

Add the ID and ASICREV_IS_DALI define

Signed-off-by: Bhawanpreet Lakha 
Reviewed-by: Huang Rui 
---
 drivers/gpu/drm/amd/display/include/dal_asic_id.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/include/dal_asic_id.h 
b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
index 1f16892f0add..1be6c44fd32f 100644
--- a/drivers/gpu/drm/amd/display/include/dal_asic_id.h
+++ b/drivers/gpu/drm/amd/display/include/dal_asic_id.h
@@ -137,10 +137,13 @@
 #define RAVEN1_F0 0xF0
 #define RAVEN_UNKNOWN 0xFF
 
+#define PICASSO_15D8_REV_E3 0xE3
+#define PICASSO_15D8_REV_E4 0xE4
+
 #define ASICREV_IS_RAVEN(eChipRev) ((eChipRev >= RAVEN_A0) && eChipRev < 
RAVEN_UNKNOWN)
 #define ASICREV_IS_PICASSO(eChipRev) ((eChipRev >= PICASSO_A0) && (eChipRev < 
RAVEN2_A0))
-#define ASICREV_IS_RAVEN2(eChipRev) ((eChipRev >= RAVEN2_A0) && (eChipRev < 
0xF0))
-
+#define ASICREV_IS_RAVEN2(eChipRev) ((eChipRev >= RAVEN2_A0) && (eChipRev < 
PICASSO_15D8_REV_E3))
+#define ASICREV_IS_DALI(eChipRev) ((eChipRev >= PICASSO_15D8_REV_E3) && 
(eChipRev < RAVEN1_F0))
 
 #define ASICREV_IS_RV1_F0(eChipRev) ((eChipRev >= RAVEN1_F0) && (eChipRev < 
RAVEN_UNKNOWN))
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/2] drm/amd/display: Implement voltage limitation for dali

2019-09-09 Thread Bhawanpreet Lakha
[Why]
we only want the lowest voltage to be available for dali.

[How]
Use the get_highest_allowed_voltage_level function
to return 0 for dali

Signed-off-by: Bhawanpreet Lakha 
Reviewed-by: Huang Rui 
---
 drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c 
b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c
index 383f4f8db8f4..9b2cb57bf2ba 100644
--- a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c
+++ b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c
@@ -708,6 +708,10 @@ static void hack_bounding_box(struct dcn_bw_internal_vars 
*v,
 
 unsigned int get_highest_allowed_voltage_level(uint32_t hw_internal_rev)
 {
+   /* for dali, the highest voltage level we want is 0 */
+   if (ASICREV_IS_DALI(hw_internal_rev))
+   return 0;
+
/* we are ok with all levels */
return 4;
 }
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: Avoid RAS recovery init when no RAS support.

2019-09-09 Thread Andrey Grodzovsky
Fixes driver load regression on APUs.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index d2437e1..119bedc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1494,9 +1494,14 @@ static int amdgpu_ras_release_bad_pages(struct 
amdgpu_device *adev)
 int amdgpu_ras_recovery_init(struct amdgpu_device *adev)
 {
struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
-   struct ras_err_handler_data **data = &con->eh_data;
+   struct ras_err_handler_data **data;
int ret;
 
+   if (con)
+   data = &con->eh_data;
+   else
+   return 0;
+
*data = kmalloc(sizeof(**data), GFP_KERNEL | __GFP_ZERO);
if (!*data) {
ret = -ENOMEM;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: Avoid RAS recovery init when no RAS support.

2019-09-09 Thread Zhang, Hawking
Reviewed-by: Hawking Zhang 

Regards,
Hawking

Sent from my iPhone

> On Sep 9, 2019, at 10:49, Andrey Grodzovsky  wrote:
> 
> Fixes driver load regression on APUs.
> 
> Signed-off-by: Andrey Grodzovsky 
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 ++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index d2437e1..119bedc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -1494,9 +1494,14 @@ static int amdgpu_ras_release_bad_pages(struct 
> amdgpu_device *adev)
> int amdgpu_ras_recovery_init(struct amdgpu_device *adev)
> {
>struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
> -struct ras_err_handler_data **data = &con->eh_data;
> +struct ras_err_handler_data **data;
>int ret;
> 
> +if (con)
> +data = &con->eh_data;
> +else
> +return 0;
> +
>*data = kmalloc(sizeof(**data), GFP_KERNEL | __GFP_ZERO);
>if (!*data) {
>ret = -ENOMEM;
> -- 
> 2.7.4
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH AUTOSEL 4.19 044/167] drm/amdgpu: validate user pitch alignment

2019-09-09 Thread Michel Dänzer
On 2019-09-07 4:58 p.m., Alex Deucher wrote:
>
> The patch shuffling doesn't help, but regardless, the same thing could
> happen even with a direct committer tree if someone missed the tag when
> committing.

True, but in the latter case it would at least be possible to add tags
referencing persistent commit hashes regardless of when fix-ups happen,
whereas with the former this isn't possible before the original change
makes it to Linus or at least Dave.


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2

2019-09-09 Thread Koenig, Christian
Well first of all we are not in interrupt context here, this is handled 
by a work item or otherwise we couldn't do all the locking.

But even in interrupt context another CPU can easily destroy the VM when 
we just handle a stale fault or the process was killed.

So this extra double checking is strictly necessary.

Regards,
Christian.

Am 09.09.19 um 16:08 schrieb Zeng, Oak:
> Is looking up vm twice necessary? I think we are in interrupt context, is it 
> possible that the user space application can be switched in between? My 
> understanding is, if user space application is can't kick in during interrupt 
> handling, application shouldn't have chance to exit (then their vm being 
> destroyed).
>
> Regards,
> Oak
>
> -Original Message-
> From: amd-gfx  On Behalf Of Christian 
> König
> Sent: Monday, September 9, 2019 8:08 AM
> To: Kuehling, Felix ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2
>
> Am 05.09.19 um 00:47 schrieb Kuehling, Felix:
>> On 2019-09-04 11:02 a.m., Christian König wrote:
>>> Next step towards HMM support. For now just silence the retry fault
>>> and optionally redirect the request to the dummy page.
>>>
>>> v2: make sure the VM is not destroyed while we handle the fault.
>>>
>>> Signed-off-by: Christian König 
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74 ++
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
>>> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
>>> 3 files changed, 80 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> index 951608fc1925..410d89966a66 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>>> @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
>>> }
>>> }
>>> }
>>> +
>>> +/**
>>> + * amdgpu_vm_handle_fault - graceful handling of VM faults.
>>> + * @adev: amdgpu device pointer
>>> + * @pasid: PASID of the VM
>>> + * @addr: Address of the fault
>>> + *
>>> + * Try to gracefully handle a VM fault. Return true if the fault was
>>> +handled and
>>> + * shouldn't be reported any more.
>>> + */
>>> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
>>> +   uint64_t addr)
>>> +{
>>> +   struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
>>> +   struct amdgpu_bo *root;
>>> +   uint64_t value, flags;
>>> +   struct amdgpu_vm *vm;
>>> +   long r;
>>> +
>>> +   if (!ring->sched.ready)
>>> +   return false;
>>> +
>>> +   spin_lock(&adev->vm_manager.pasid_lock);
>>> +   vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>> +   if (vm)
>>> +   root = amdgpu_bo_ref(vm->root.base.bo);
>>> +   else
>>> +   root = NULL;
>>> +   spin_unlock(&adev->vm_manager.pasid_lock);
>>> +
>>> +   if (!root)
>>> +   return false;
>>> +
>>> +   r = amdgpu_bo_reserve(root, true);
>>> +   if (r)
>>> +   goto error_unref;
>>> +
>>> +   spin_lock(&adev->vm_manager.pasid_lock);
>>> +   vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
>>> +   spin_unlock(&adev->vm_manager.pasid_lock);
>> I think this deserves a comment. If I understand it correctly, you're
>> looking up the vm twice so that you have the VM root reservation to
>> protect against user-after-free. Otherwise the vm pointer is only
>> valid as long as you're holding the spin-lock.
>>
>>
>>> +
>>> +   if (!vm || vm->root.base.bo != root)
>> The check of vm->root.base.bo should probably still be under the
>> spin_lock. Because you're not sure yet it's the right VM, you can't
>> rely on the reservation here to prevent use-after-free.
> Good point, going to fix that.
>
>>
>>> +   goto error_unlock;
>>> +
>>> +   addr /= AMDGPU_GPU_PAGE_SIZE;
>>> +   flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
>>> +   AMDGPU_PTE_SYSTEM;
>>> +
>>> +   if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
>>> +   /* Redirect the access to the dummy page */
>>> +   value = adev->dummy_page_addr;
>>> +   flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
>>> +   AMDGPU_PTE_WRITEABLE;
>>> +   } else {
>>> +   value = 0;
>>> +   }
>>> +
>>> +   r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 1,
>>> +   flags, value, NULL, NULL);
>>> +   if (r)
>>> +   goto error_unlock;
>>> +
>>> +   r = amdgpu_vm_update_pdes(adev, vm, true);
>>> +
>>> +error_unlock:
>>> +   amdgpu_bo_unreserve(root);
>>> +   if (r < 0)
>>> +   DRM_ERROR("Can't handle page fault (%ld)\n", r);
>>> +
>>> +error_unref:
>>> +   amdgpu_bo_unref(&root);
>>> +
>>> +   return false;
>>> +}
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
>>> index 0a97dc839f3b..4dbbe1b6b413 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_v

Re: [PATCH 9/9] drm/amdgpu: add graceful VM fault handling v2

2019-09-09 Thread Koenig, Christian
Am 09.09.19 um 15:58 schrieb Yang, Philip:
>
> On 2019-09-09 8:03 a.m., Christian König wrote:
>> Am 04.09.19 um 22:12 schrieb Yang, Philip:
>>> This series looks nice and clear for me, two questions embedded below.
>>>
>>> Are we going to use dedicated sdma page queue for direct VM update path
>>> during a fault?
>>>
>>> Thanks,
>>> Philip
>>>
>>> On 2019-09-04 11:02 a.m., Christian König wrote:
 Next step towards HMM support. For now just silence the retry fault and
 optionally redirect the request to the dummy page.

 v2: make sure the VM is not destroyed while we handle the fault.

 Signed-off-by: Christian König 
 ---
     drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 74
 ++
     drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  2 +
     drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  4 ++
     3 files changed, 80 insertions(+)

 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
 index 951608fc1925..410d89966a66 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
 @@ -3142,3 +3142,77 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm
 *vm)
     }
     }
     }
 +
 +/**
 + * amdgpu_vm_handle_fault - graceful handling of VM faults.
 + * @adev: amdgpu device pointer
 + * @pasid: PASID of the VM
 + * @addr: Address of the fault
 + *
 + * Try to gracefully handle a VM fault. Return true if the fault was
 handled and
 + * shouldn't be reported any more.
 + */
 +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int
 pasid,
 +    uint64_t addr)
 +{
 +    struct amdgpu_ring *ring = &adev->sdma.instance[0].page;
 +    struct amdgpu_bo *root;
 +    uint64_t value, flags;
 +    struct amdgpu_vm *vm;
 +    long r;
 +
 +    if (!ring->sched.ready)
 +    return false;
 +
 +    spin_lock(&adev->vm_manager.pasid_lock);
 +    vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
 +    if (vm)
 +    root = amdgpu_bo_ref(vm->root.base.bo);
 +    else
 +    root = NULL;
 +    spin_unlock(&adev->vm_manager.pasid_lock);
 +
 +    if (!root)
 +    return false;
 +
 +    r = amdgpu_bo_reserve(root, true);
 +    if (r)
 +    goto error_unref;
 +
 +    spin_lock(&adev->vm_manager.pasid_lock);
 +    vm = idr_find(&adev->vm_manager.pasid_idr, pasid);
 +    spin_unlock(&adev->vm_manager.pasid_lock);
 +
>>> Here get vm from pasid second time, and check if PD bo is changed, is
>>> this to handle vm fault race with vm destory?
>> Yes, exactly.
>>
 +    if (!vm || vm->root.base.bo != root)
 +    goto error_unlock;
 +
 +    addr /= AMDGPU_GPU_PAGE_SIZE;
 +    flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
 +    AMDGPU_PTE_SYSTEM;
 +
 +    if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
 +    /* Redirect the access to the dummy page */
 +    value = adev->dummy_page_addr;
 +    flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
 +    AMDGPU_PTE_WRITEABLE;
 +    } else {
 +    value = 0;
 +    }
 +
 +    r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr
 + 1,
 +    flags, value, NULL, NULL);
 +    if (r)
 +    goto error_unlock;
 +
>>> After fault address redirect to dummy page, will the fault recover and
>>> retry continue to execute?
>> Yes, the read/write operation will just retry and use the value from the
>> dummy page instead.
>>
>>> Is this dangerous to update PTE to use system
>>> memory address 0?
>> What are you talking about? The dummy page is a page allocate by TTM
>> where we redirect faulty accesses to.
>>
> For amdgpu_vm_fault_stop equals to AMDGPU_VM_FAULT_STOP_FIRST/ALWAYS
> case, value is 0, this will redirect to system memory 0. Maybe redirect
> is only needed for AMDGPU_VM_FAULT_STOP_NEVER?

The value 0 doesn't redirect to system memory, it results in a silence 
retry when neither the R nor the W bit is set in a PTE.

Regards,
Christian.

>
> Regards,
> Philip
>
>> Regards,
>> Christian.
>>
 +    r = amdgpu_vm_update_pdes(adev, vm, true);
 +
 +error_unlock:
 +    amdgpu_bo_unreserve(root);
 +    if (r < 0)
 +    DRM_ERROR("Can't handle page fault (%ld)\n", r);
 +
 +error_unref:
 +    amdgpu_bo_unref(&root);
 +
 +    return false;
 +}
 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
 index 0a97dc839f3b..4dbbe1b6b413 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
 @@ -413,6 +413,8 @@ void amdgpu_vm_check_compute_bug(struct
 amdgpu_device *adev);
     void amdgpu_vm_get_

Re: [PATCH 0/2] Enable Dali for DC

2019-09-09 Thread Harry Wentland
Series is
Acked-by: Harry Wentland 

Harry

On 2019-09-09 10:37 a.m., Bhawanpreet Lakha wrote:
> Dali is a new asic based on raven. This patch adds the asic ID and
> support for it in the display core.
> 
> Bhawanpreet Lakha (2):
>drm/amd/display: add Asic ID for Dali
>drm/amd/display: Implement voltage limitation for dali
> 
>   drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c  | 4 
>   drivers/gpu/drm/amd/display/include/dal_asic_id.h | 7 +--
>   2 files changed, 9 insertions(+), 2 deletions(-)
> 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/2] drm/amdgpu: Allow to reset to EERPOM table.

2019-09-09 Thread Andrey Grodzovsky
The table grows quickly during debug/development effort when
multiple RAS errors are injected. Allow to avoid this by setting
table header back to empty if needed.

v2: Switch to debugfs entry instead of load time parameter.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 119bedc..52c5c61 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -303,6 +303,17 @@ static ssize_t amdgpu_ras_debugfs_ctrl_write(struct file 
*f, const char __user *
return size;
 }
 
+static ssize_t amdgpu_ras_debugfs_eeprom_write(struct file *f, const char 
__user *buf,
+   size_t size, loff_t *pos)
+{
+   struct amdgpu_device *adev = (struct amdgpu_device 
*)file_inode(f)->i_private;
+   int ret;
+
+   ret = amdgpu_ras_eeprom_reset_table(&adev->psp.ras.ras->eeprom_control);
+
+   return ret == 1 ? size : -EIO;
+}
+
 static const struct file_operations amdgpu_ras_debugfs_ctrl_ops = {
.owner = THIS_MODULE,
.read = NULL,
@@ -310,6 +321,13 @@ static const struct file_operations 
amdgpu_ras_debugfs_ctrl_ops = {
.llseek = default_llseek
 };
 
+static const struct file_operations amdgpu_ras_debugfs_eeprom_ops = {
+   .owner = THIS_MODULE,
+   .read = NULL,
+   .write = amdgpu_ras_debugfs_eeprom_write,
+   .llseek = default_llseek
+};
+
 static ssize_t amdgpu_ras_sysfs_read(struct device *dev,
struct device_attribute *attr, char *buf)
 {
@@ -951,6 +969,8 @@ static void amdgpu_ras_debugfs_create_ctrl_node(struct 
amdgpu_device *adev)
con->dir = debugfs_create_dir("ras", minor->debugfs_root);
con->ent = debugfs_create_file("ras_ctrl", S_IWUGO | S_IRUGO, con->dir,
   adev, &amdgpu_ras_debugfs_ctrl_ops);
+   con->ent = debugfs_create_file("ras_eeprom_reset", S_IWUGO | S_IRUGO, 
con->dir,
+  adev, 
&amdgpu_ras_debugfs_eeprom_ops);
 }
 
 void amdgpu_ras_debugfs_create(struct amdgpu_device *adev,
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/2] drm/amdgpu: Add amdgpu_ras_eeprom_reset_table

2019-09-09 Thread Andrey Grodzovsky
This will allow to reset the table on the fly.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 25 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h |  1 +
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index 43dd4ab..11a8445 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -102,6 +102,22 @@ static int __update_table_header(struct 
amdgpu_ras_eeprom_control *control,
 
 static uint32_t  __calc_hdr_byte_sum(struct amdgpu_ras_eeprom_control 
*control);
 
+int amdgpu_ras_eeprom_reset_table(struct amdgpu_ras_eeprom_control *control)
+{
+   unsigned char buff[EEPROM_ADDRESS_SIZE + EEPROM_TABLE_HEADER_SIZE] = { 
0 };
+   struct amdgpu_device *adev = to_amdgpu_device(control);
+   struct amdgpu_ras_eeprom_table_header *hdr = &control->tbl_hdr;
+
+   hdr->header = EEPROM_TABLE_HDR_VAL;
+   hdr->version = EEPROM_TABLE_VER;
+   hdr->first_rec_offset = EEPROM_RECORD_START;
+   hdr->tbl_size = EEPROM_TABLE_HEADER_SIZE;
+
+   adev->psp.ras.ras->eeprom_control.tbl_byte_sum =
+   __calc_hdr_byte_sum(&adev->psp.ras.ras->eeprom_control);
+   return __update_table_header(control, buff);
+}
+
 int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control)
 {
int ret = 0;
@@ -149,14 +165,7 @@ int amdgpu_ras_eeprom_init(struct 
amdgpu_ras_eeprom_control *control)
} else {
DRM_INFO("Creating new EEPROM table");
 
-   hdr->header = EEPROM_TABLE_HDR_VAL;
-   hdr->version = EEPROM_TABLE_VER;
-   hdr->first_rec_offset = EEPROM_RECORD_START;
-   hdr->tbl_size = EEPROM_TABLE_HEADER_SIZE;
-
-   adev->psp.ras.ras->eeprom_control.tbl_byte_sum =
-   
__calc_hdr_byte_sum(&adev->psp.ras.ras->eeprom_control);
-   ret = __update_table_header(control, buff);
+   ret = amdgpu_ras_eeprom_reset_table(control);
}
 
/* Start inserting records from here */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h
index 41f3fcb..6222699 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h
@@ -79,6 +79,7 @@ struct eeprom_table_record {
 
 int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control);
 void amdgpu_ras_eeprom_fini(struct amdgpu_ras_eeprom_control *control);
+int amdgpu_ras_eeprom_reset_table(struct amdgpu_ras_eeprom_control *control);
 
 int amdgpu_ras_eeprom_process_recods(struct amdgpu_ras_eeprom_control *control,
struct eeprom_table_record *records,
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH V5] drm: Add link training repeaters addresses

2019-09-09 Thread Siqueira, Rodrigo
DP 1.3 specification introduces the Link Training-tunable PHY Repeater,
and DP 1.4* supplemented it with new features. In the 1.4a spec, it was
introduced some innovations to make handy to add support for systems
with Thunderbolt or other repeater devices.

It is important to highlight that DP specification had some updates from
1.3 through 1.4a. In particular, DP 1.4 defines Repeater_FEC_CAPABILITY
at the address 0xf0004, and DP 1.4a redefined the address 0xf0004 to
DP_MAX_LANE_COUNT_PHY_REPEATER.

Changes since V4:
- Update commit message
- Fix misleading comments related to the spec version
Changes since V3:
- Replace spaces by tabs
Changes since V2:
- Drop the kernel-doc comment
- Reorder LTTPR according to register offset
Changes since V1:
- Adjusts registers names to be aligned with spec and the rest of the
  file
- Update spec comment from 1.4 to 1.4a

Cc: Abdoulaye Berthe 
Cc: Harry Wentland 
Cc: Leo Li 
Cc: Jani Nikula 
Cc: Manasi Navare 
Cc: Ville Syrjälä 
Signed-off-by: Rodrigo Siqueira 
Signed-off-by: Abdoulaye Berthe 
Reviewed-by: Harry Wentland 
---
 include/drm/drm_dp_helper.h | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
index 7972b925a952..fddcd84601f8 100644
--- a/include/drm/drm_dp_helper.h
+++ b/include/drm/drm_dp_helper.h
@@ -966,6 +966,32 @@
 #define DP_HDCP_2_2_REG_STREAM_TYPE_OFFSET 0x69494
 #define DP_HDCP_2_2_REG_DBG_OFFSET 0x69518
 
+/* Link Training (LT)-tunable PHY Repeaters */
+#define DP_LT_TUNABLE_PHY_REPEATER_FIELD_DATA_STRUCTURE_REV 0xf /* 1.3 */
+#define DP_MAX_LINK_RATE_PHY_REPEATER  0xf0001 /* 1.4a */
+#define DP_PHY_REPEATER_CNT0xf0002 /* 1.3 */
+#define DP_PHY_REPEATER_MODE   0xf0003 /* 1.3 */
+#define DP_MAX_LANE_COUNT_PHY_REPEATER 0xf0004 /* 1.4a */
+#define DP_Repeater_FEC_CAPABILITY 0xf0004 /* 1.4 */
+#define DP_PHY_REPEATER_EXTENDED_WAIT_TIMEOUT  0xf0005 /* 1.4a */
+#define DP_TRAINING_PATTERN_SET_PHY_REPEATER1  0xf0010 /* 1.3 */
+#define DP_TRAINING_LANE0_SET_PHY_REPEATER10xf0011 /* 1.3 */
+#define DP_TRAINING_LANE1_SET_PHY_REPEATER10xf0012 /* 1.3 */
+#define DP_TRAINING_LANE2_SET_PHY_REPEATER10xf0013 /* 1.3 */
+#define DP_TRAINING_LANE3_SET_PHY_REPEATER10xf0014 /* 1.3 */
+#define DP_TRAINING_AUX_RD_INTERVAL_PHY_REPEATER1  0xf0020 /* 1.4a */
+#define DP_TRANSMITTER_CAPABILITY_PHY_REPEATER10xf0021 /* 
1.4a */
+#define DP_LANE0_1_STATUS_PHY_REPEATER10xf0030 /* 
1.3 */
+#define DP_LANE2_3_STATUS_PHY_REPEATER10xf0031 /* 
1.3 */
+#define DP_LANE_ALIGN_STATUS_UPDATED_PHY_REPEATER1 0xf0032 /* 1.3 */
+#define DP_ADJUST_REQUEST_LANE0_1_PHY_REPEATER10xf0033 /* 
1.3 */
+#define DP_ADJUST_REQUEST_LANE2_3_PHY_REPEATER10xf0034 /* 
1.3 */
+#define DP_SYMBOL_ERROR_COUNT_LANE0_PHY_REPEATER1  0xf0035 /* 1.3 */
+#define DP_SYMBOL_ERROR_COUNT_LANE1_PHY_REPEATER1  0xf0037 /* 1.3 */
+#define DP_SYMBOL_ERROR_COUNT_LANE2_PHY_REPEATER1  0xf0039 /* 1.3 */
+#define DP_SYMBOL_ERROR_COUNT_LANE3_PHY_REPEATER1  0xf003b /* 1.3 */
+#define DP_FEC_STATUS_PHY_REPEATER10xf0290 /* 1.4 */
+
 /* DP HDCP message start offsets in DPCD address space */
 #define DP_HDCP_2_2_AKE_INIT_OFFSETDP_HDCP_2_2_REG_RTX_OFFSET
 #define DP_HDCP_2_2_AKE_SEND_CERT_OFFSET   DP_HDCP_2_2_REG_CERT_RX_OFFSET
-- 
2.23.0


signature.asc
Description: PGP signature
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH V5] drm: Add link training repeaters addresses

2019-09-09 Thread Siqueira, Rodrigo
Please, ignore this patch.

I just noticed that I sent the wrong version. I resend the correct patch
with the title:

[PATCH V5] drm: Add definitions for link training repeaters

Sorry for this mistake.

On 09/09, Siqueira, Rodrigo wrote:
> DP 1.3 specification introduces the Link Training-tunable PHY Repeater,
> and DP 1.4* supplemented it with new features. In the 1.4a spec, it was
> introduced some innovations to make handy to add support for systems
> with Thunderbolt or other repeater devices.
> 
> It is important to highlight that DP specification had some updates from
> 1.3 through 1.4a. In particular, DP 1.4 defines Repeater_FEC_CAPABILITY
> at the address 0xf0004, and DP 1.4a redefined the address 0xf0004 to
> DP_MAX_LANE_COUNT_PHY_REPEATER.
> 
> Changes since V4:
> - Update commit message
> - Fix misleading comments related to the spec version
> Changes since V3:
> - Replace spaces by tabs
> Changes since V2:
> - Drop the kernel-doc comment
> - Reorder LTTPR according to register offset
> Changes since V1:
> - Adjusts registers names to be aligned with spec and the rest of the
>   file
> - Update spec comment from 1.4 to 1.4a
> 
> Cc: Abdoulaye Berthe 
> Cc: Harry Wentland 
> Cc: Leo Li 
> Cc: Jani Nikula 
> Cc: Manasi Navare 
> Cc: Ville Syrjälä 
> Signed-off-by: Rodrigo Siqueira 
> Signed-off-by: Abdoulaye Berthe 
> Reviewed-by: Harry Wentland 
> ---
>  include/drm/drm_dp_helper.h | 26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
> index 7972b925a952..fddcd84601f8 100644
> --- a/include/drm/drm_dp_helper.h
> +++ b/include/drm/drm_dp_helper.h
> @@ -966,6 +966,32 @@
>  #define DP_HDCP_2_2_REG_STREAM_TYPE_OFFSET   0x69494
>  #define DP_HDCP_2_2_REG_DBG_OFFSET   0x69518
>  
> +/* Link Training (LT)-tunable PHY Repeaters */
> +#define DP_LT_TUNABLE_PHY_REPEATER_FIELD_DATA_STRUCTURE_REV 0xf /* 1.3 */
> +#define DP_MAX_LINK_RATE_PHY_REPEATER0xf0001 /* 
> 1.4a */
> +#define DP_PHY_REPEATER_CNT  0xf0002 /* 1.3 */
> +#define DP_PHY_REPEATER_MODE 0xf0003 /* 1.3 */
> +#define DP_MAX_LANE_COUNT_PHY_REPEATER   0xf0004 /* 
> 1.4a */
> +#define DP_Repeater_FEC_CAPABILITY   0xf0004 /* 1.4 */
> +#define DP_PHY_REPEATER_EXTENDED_WAIT_TIMEOUT0xf0005 /* 
> 1.4a */
> +#define DP_TRAINING_PATTERN_SET_PHY_REPEATER10xf0010 /* 
> 1.3 */
> +#define DP_TRAINING_LANE0_SET_PHY_REPEATER1  0xf0011 /* 1.3 */
> +#define DP_TRAINING_LANE1_SET_PHY_REPEATER1  0xf0012 /* 1.3 */
> +#define DP_TRAINING_LANE2_SET_PHY_REPEATER1  0xf0013 /* 1.3 */
> +#define DP_TRAINING_LANE3_SET_PHY_REPEATER1  0xf0014 /* 1.3 */
> +#define DP_TRAINING_AUX_RD_INTERVAL_PHY_REPEATER10xf0020 /* 1.4a */
> +#define DP_TRANSMITTER_CAPABILITY_PHY_REPEATER1  0xf0021 /* 
> 1.4a */
> +#define DP_LANE0_1_STATUS_PHY_REPEATER1  0xf0030 /* 
> 1.3 */
> +#define DP_LANE2_3_STATUS_PHY_REPEATER1  0xf0031 /* 
> 1.3 */
> +#define DP_LANE_ALIGN_STATUS_UPDATED_PHY_REPEATER1   0xf0032 /* 1.3 */
> +#define DP_ADJUST_REQUEST_LANE0_1_PHY_REPEATER1  0xf0033 /* 
> 1.3 */
> +#define DP_ADJUST_REQUEST_LANE2_3_PHY_REPEATER1  0xf0034 /* 
> 1.3 */
> +#define DP_SYMBOL_ERROR_COUNT_LANE0_PHY_REPEATER10xf0035 /* 1.3 */
> +#define DP_SYMBOL_ERROR_COUNT_LANE1_PHY_REPEATER10xf0037 /* 1.3 */
> +#define DP_SYMBOL_ERROR_COUNT_LANE2_PHY_REPEATER10xf0039 /* 1.3 */
> +#define DP_SYMBOL_ERROR_COUNT_LANE3_PHY_REPEATER10xf003b /* 1.3 */
> +#define DP_FEC_STATUS_PHY_REPEATER1  0xf0290 /* 1.4 */
> +
>  /* DP HDCP message start offsets in DPCD address space */
>  #define DP_HDCP_2_2_AKE_INIT_OFFSET  DP_HDCP_2_2_REG_RTX_OFFSET
>  #define DP_HDCP_2_2_AKE_SEND_CERT_OFFSET DP_HDCP_2_2_REG_CERT_RX_OFFSET
> -- 
> 2.23.0



-- 
Rodrigo Siqueira
Software Engineer, Advanced Micro Devices (AMD)
https://siqueira.tech


signature.asc
Description: PGP signature
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH V5] drm: Add definitions for link training repeaters

2019-09-09 Thread Siqueira, Rodrigo
This change adds definitions required for Link Training-tunable PHY
Repeater, which was introduced in the DP 1.3 specification, and
incremented with new features in the DP 1.4*.

Changes since V4:
- Update commit message
- Fix misleading comments related to the spec version
Changes since V3:
- Replace spaces by tabs
Changes since V2:
- Drop the kernel-doc comment
- Reorder LTTPR according to register offset
Changes since V1:
- Adjusts registers names to be aligned with spec and the rest of the
  file
- Update spec comment from 1.4 to 1.4a

Cc: Abdoulaye Berthe 
Cc: Harry Wentland 
Cc: Leo Li 
Cc: Jani Nikula 
Cc: Manasi Navare 
Cc: Ville Syrjälä 
Signed-off-by: Rodrigo Siqueira 
Signed-off-by: Abdoulaye Berthe 
Reviewed-by: Harry Wentland 
---
 include/drm/drm_dp_helper.h | 25 +
 1 file changed, 25 insertions(+)

diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
index 7972b925a952..b1a9a0dcc177 100644
--- a/include/drm/drm_dp_helper.h
+++ b/include/drm/drm_dp_helper.h
@@ -966,6 +966,31 @@
 #define DP_HDCP_2_2_REG_STREAM_TYPE_OFFSET 0x69494
 #define DP_HDCP_2_2_REG_DBG_OFFSET 0x69518
 
+/* Link Training (LT)-tunable PHY Repeaters */
+#define DP_LT_TUNABLE_PHY_REPEATER_FIELD_DATA_STRUCTURE_REV 0xf /* 1.3 */
+#define DP_MAX_LINK_RATE_PHY_REPEATER  0xf0001 /* 1.4a */
+#define DP_PHY_REPEATER_CNT0xf0002 /* 1.3 */
+#define DP_PHY_REPEATER_MODE   0xf0003 /* 1.3 */
+#define DP_MAX_LANE_COUNT_PHY_REPEATER 0xf0004 /* 1.4a */
+#define DP_PHY_REPEATER_EXTENDED_WAIT_TIMEOUT  0xf0005 /* 1.4a */
+#define DP_TRAINING_PATTERN_SET_PHY_REPEATER1  0xf0010 /* 1.3 */
+#define DP_TRAINING_LANE0_SET_PHY_REPEATER10xf0011 /* 1.3 */
+#define DP_TRAINING_LANE1_SET_PHY_REPEATER10xf0012 /* 1.3 */
+#define DP_TRAINING_LANE2_SET_PHY_REPEATER10xf0013 /* 1.3 */
+#define DP_TRAINING_LANE3_SET_PHY_REPEATER10xf0014 /* 1.3 */
+#define DP_TRAINING_AUX_RD_INTERVAL_PHY_REPEATER1  0xf0020 /* 1.4a */
+#define DP_TRANSMITTER_CAPABILITY_PHY_REPEATER10xf0021 /* 
1.4a */
+#define DP_LANE0_1_STATUS_PHY_REPEATER10xf0030 /* 
1.3 */
+#define DP_LANE2_3_STATUS_PHY_REPEATER10xf0031 /* 
1.3 */
+#define DP_LANE_ALIGN_STATUS_UPDATED_PHY_REPEATER1 0xf0032 /* 1.3 */
+#define DP_ADJUST_REQUEST_LANE0_1_PHY_REPEATER10xf0033 /* 
1.3 */
+#define DP_ADJUST_REQUEST_LANE2_3_PHY_REPEATER10xf0034 /* 
1.3 */
+#define DP_SYMBOL_ERROR_COUNT_LANE0_PHY_REPEATER1  0xf0035 /* 1.3 */
+#define DP_SYMBOL_ERROR_COUNT_LANE1_PHY_REPEATER1  0xf0037 /* 1.3 */
+#define DP_SYMBOL_ERROR_COUNT_LANE2_PHY_REPEATER1  0xf0039 /* 1.3 */
+#define DP_SYMBOL_ERROR_COUNT_LANE3_PHY_REPEATER1  0xf003b /* 1.3 */
+#define DP_FEC_STATUS_PHY_REPEATER10xf0290 /* 1.4 */
+
 /* DP HDCP message start offsets in DPCD address space */
 #define DP_HDCP_2_2_AKE_INIT_OFFSETDP_HDCP_2_2_REG_RTX_OFFSET
 #define DP_HDCP_2_2_AKE_SEND_CERT_OFFSET   DP_HDCP_2_2_REG_CERT_RX_OFFSET
-- 
2.23.0


signature.asc
Description: PGP signature
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 1/2] drm/amdgpu: Add amdgpu_ras_eeprom_reset_table

2019-09-09 Thread Chen, Guchun
Series is: Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Andrey Grodzovsky  
Sent: Tuesday, September 10, 2019 4:04 AM
To: amd-gfx@lists.freedesktop.org
Cc: Chen, Guchun ; Zhou1, Tao ; 
Deucher, Alexander ; Grodzovsky, Andrey 

Subject: [PATCH 1/2] drm/amdgpu: Add amdgpu_ras_eeprom_reset_table

This will allow to reset the table on the fly.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 25 +  
drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h |  1 +
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index 43dd4ab..11a8445 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -102,6 +102,22 @@ static int __update_table_header(struct 
amdgpu_ras_eeprom_control *control,
 
 static uint32_t  __calc_hdr_byte_sum(struct amdgpu_ras_eeprom_control 
*control);
 
+int amdgpu_ras_eeprom_reset_table(struct amdgpu_ras_eeprom_control 
+*control) {
+   unsigned char buff[EEPROM_ADDRESS_SIZE + EEPROM_TABLE_HEADER_SIZE] = { 
0 };
+   struct amdgpu_device *adev = to_amdgpu_device(control);
+   struct amdgpu_ras_eeprom_table_header *hdr = &control->tbl_hdr;
+
+   hdr->header = EEPROM_TABLE_HDR_VAL;
+   hdr->version = EEPROM_TABLE_VER;
+   hdr->first_rec_offset = EEPROM_RECORD_START;
+   hdr->tbl_size = EEPROM_TABLE_HEADER_SIZE;
+
+   adev->psp.ras.ras->eeprom_control.tbl_byte_sum =
+   __calc_hdr_byte_sum(&adev->psp.ras.ras->eeprom_control);
+   return __update_table_header(control, buff); }
+
 int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control)  {
int ret = 0;
@@ -149,14 +165,7 @@ int amdgpu_ras_eeprom_init(struct 
amdgpu_ras_eeprom_control *control)
} else {
DRM_INFO("Creating new EEPROM table");
 
-   hdr->header = EEPROM_TABLE_HDR_VAL;
-   hdr->version = EEPROM_TABLE_VER;
-   hdr->first_rec_offset = EEPROM_RECORD_START;
-   hdr->tbl_size = EEPROM_TABLE_HEADER_SIZE;
-
-   adev->psp.ras.ras->eeprom_control.tbl_byte_sum =
-   
__calc_hdr_byte_sum(&adev->psp.ras.ras->eeprom_control);
-   ret = __update_table_header(control, buff);
+   ret = amdgpu_ras_eeprom_reset_table(control);
}
 
/* Start inserting records from here */ diff --git 
a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h
index 41f3fcb..6222699 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h
@@ -79,6 +79,7 @@ struct eeprom_table_record {
 
 int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control);  void 
amdgpu_ras_eeprom_fini(struct amdgpu_ras_eeprom_control *control);
+int amdgpu_ras_eeprom_reset_table(struct amdgpu_ras_eeprom_control 
+*control);
 
 int amdgpu_ras_eeprom_process_recods(struct amdgpu_ras_eeprom_control *control,
struct eeprom_table_record *records,
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 3/3] drm/amdgpu: rename umc ras_init to ras_asic_init

2019-09-09 Thread Zhang, Hawking
RE - err_cnt_init is good for current implementation but may be not enough for 
the future, how about rename it to umc.funcs->ras_hw_init?

I think it's better that one callback function map to one specific hw sequence. 
Going forward, if there is additional programming needed for another generation 
of IP, we can add it as a new callback function. But for now, err_cnt_init is 
exactly what we did. 

Regards,
Hawking

-Original Message-
From: Zhou1, Tao  
Sent: 2019年9月9日 11:01
To: Zhang, Hawking ; amd-gfx@lists.freedesktop.org; 
Chen, Guchun 
Subject: RE: [PATCH 3/3] drm/amdgpu: rename umc ras_init to ras_asic_init

umc.funcs->ras_late_init is common for all versions of umc, so it's implemented 
in amdgpu_umc.c, but ras_asic_init is specific to each version of umc and is 
placed in umc_vx_x.c.
err_cnt_init is good for current implementation but may be not enough for the 
future, how about rename it to umc.funcs->ras_hw_init?

Regards,
Tao

> -Original Message-
> From: Zhang, Hawking 
> Sent: 2019年9月9日 6:40
> To: Zhang, Hawking ; Zhou1, Tao 
> ; amd-gfx@lists.freedesktop.org; Chen, Guchun 
> 
> Subject: RE: [PATCH 3/3] drm/amdgpu: rename umc ras_init to 
> ras_asic_init
> 
> Never mind. I was confused by the name "ras_asic_init". The 
> programming sequence is exactly what we discussed before. I think we 
> can rename this function to "err_cnt_init".
> 
> Regards,
> Hawking
> -Original Message-
> From: amd-gfx  On Behalf Of 
> Zhang, Hawking
> Sent: 2019年9月9日 6:23
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org; 
> Chen, Guchun 
> Subject: RE: [PATCH 3/3] drm/amdgpu: rename umc ras_init to 
> ras_asic_init
> 
> The ras init (or the new asic_init) seems not necessary as last time 
> we discussed. Any UMC RAS register initialization is safe enough to be 
> centralized to ras_late_init interface. I would suggest to reduce such 
> kind of un-necessary interface.
> 
> Regards,
> Hawking
> -Original Message-
> From: Zhou1, Tao 
> Sent: 2019年9月6日 17:01
> To: amd-gfx@lists.freedesktop.org; Zhang, Hawking 
> ; Chen, Guchun 
> Cc: Zhou1, Tao 
> Subject: [PATCH 3/3] drm/amdgpu: rename umc ras_init to ras_asic_init
> 
> this interface is related to specific version of umc, distinguish it 
> from ras_late_init
> 
> Signed-off-by: Tao Zhou 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 4 ++-- 
> drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 2 +-
>  drivers/gpu/drm/amd/amdgpu/umc_v6_1.c   | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> index 5683c51710aa..b1c7f643f198 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> @@ -63,8 +63,8 @@ int amdgpu_umc_ras_late_init(struct amdgpu_device 
> *adev, void *ras_ih_info)
>   }
> 
>   /* ras init of specific umc version */
> - if (adev->umc.funcs && adev->umc.funcs->ras_init)
> - adev->umc.funcs->ras_init(adev);
> + if (adev->umc.funcs && adev->umc.funcs->ras_asic_init)
> + adev->umc.funcs->ras_asic_init(adev);
> 
>   return 0;
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
> index 6f22c9704555..a5e4df2440be 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
> @@ -54,7 +54,7 @@
>   adev->umc.funcs->disable_umc_index_mode(adev);
> 
>  struct amdgpu_umc_funcs {
> - void (*ras_init)(struct amdgpu_device *adev);
> + void (*ras_asic_init)(struct amdgpu_device *adev);
>   int (*ras_late_init)(struct amdgpu_device *adev, void *ras_ih_info);
>   void (*query_ras_error_count)(struct amdgpu_device *adev,
>   void *ras_error_status);
> diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
> b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
> index 4cdb5c04cd17..92f3b148e181 100644
> --- a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
> +++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
> @@ -272,7 +272,7 @@ static void umc_v6_1_ras_init(struct amdgpu_device
> *adev)  }
> 
>  const struct amdgpu_umc_funcs umc_v6_1_funcs = {
> - .ras_init = umc_v6_1_ras_init,
> + .ras_asic_init = umc_v6_1_ras_init,
>   .ras_late_init = amdgpu_umc_ras_late_init,
>   .query_ras_error_count = umc_v6_1_query_ras_error_count,
>   .query_ras_error_address = umc_v6_1_query_ras_error_address,
> --
> 2.17.1
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 1/2] drm/amdgpu: initialize ras structures for xgmi block

2019-09-09 Thread Zhang, Hawking
I'd like to keep the conditional check in amdgpu_xgmi_ras_late_init internal so 
that it can be used anywhere without any conditional check from external. 
Please check v2.

Regards,
Hawking

-Original Message-
From: Chen, Guchun  
Sent: 2019年9月9日 9:22
To: Zhang, Hawking ; amd-gfx@lists.freedesktop.org; 
Zhou1, Tao ; Li, Dennis ; Deucher, 
Alexander 
Subject: RE: [PATCH 1/2] drm/amdgpu: initialize ras structures for xgmi block



-Original Message-
From: Zhang, Hawking  
Sent: Monday, September 9, 2019 6:07 AM
To: amd-gfx@lists.freedesktop.org; Zhou1, Tao ; Chen, Guchun 
; Li, Dennis ; Deucher, Alexander 

Cc: Zhang, Hawking 
Subject: [PATCH 1/2] drm/amdgpu: initialize ras structures for xgmi block

init ras common interface and fs node for xgmi block

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h  |  1 +  
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 36   
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h |  1 +
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  7 +++
 4 files changed, 45 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index 331ce50..f09bd30 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -120,6 +120,7 @@ struct amdgpu_xgmi {
/* gpu list in the same hive */
struct list_head head;
bool supported;
+   struct ras_common_if *ras_if;
 };
 
 struct amdgpu_gmc {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 65aae75..7f6f2e9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -25,6 +25,7 @@
 #include "amdgpu.h"
 #include "amdgpu_xgmi.h"
 #include "amdgpu_smu.h"
+#include "amdgpu_ras.h"
 #include "df/df_3_6_offset.h"
 
 static DEFINE_MUTEX(xgmi_mutex);
@@ -437,3 +438,38 @@ void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
mutex_unlock(&hive->hive_lock);
}
 }
+
+int amdgpu_xgmi_ras_late_init(struct amdgpu_device *adev) {
+   int r;
+   struct ras_ih_if ih_info = {
+   .cb = NULL,
+   };
+   struct ras_fs_if fs_info = {
+   .sysfs_name = "xgmi_wafl_err_count",
+   .debugfs_name = "xgmi_wafl_err_inject",
+   };
+
+   if (!adev->gmc.xgmi.supported ||
+   adev->gmc.xgmi.num_physical_nodes == 0)
+   return 0;
[Guchun]Looks this check num_physical_nodes==0 is redundant, as there is 
already one check outside of this function.
Is it better to move these two conditions outside of this function, to allow 
function entrance once xgmi.supported is true and num_physical_nodes > 0?
if (adev->gmc.xgmi.num_physical_nodes > 1) {
r = amdgpu_xgmi_ras_late_init(adev);
if (r)
return r;
}

+   if (!adev->gmc.xgmi.ras_if) {
+   adev->gmc.xgmi.ras_if = kmalloc(sizeof(struct ras_common_if), 
GFP_KERNEL);
+   if (!adev->gmc.xgmi.ras_if)
+   return -ENOMEM;
+   adev->gmc.xgmi.ras_if->block = AMDGPU_RAS_BLOCK__XGMI_WAFL;
+   adev->gmc.xgmi.ras_if->type = 
AMDGPU_RAS_ERROR__MULTI_UNCORRECTABLE;
+   adev->gmc.xgmi.ras_if->sub_block_index = 0;
+   strcpy(adev->gmc.xgmi.ras_if->name, "xgmi_wafl");
+   }
+   ih_info.head = fs_info.head = *adev->gmc.xgmi.ras_if;
+   r = amdgpu_ras_late_init(adev, adev->gmc.xgmi.ras_if,
+&fs_info, &ih_info);
+   if (r || !amdgpu_ras_is_supported(adev, adev->gmc.xgmi.ras_if->block)) {
+   kfree(adev->gmc.xgmi.ras_if);
+   adev->gmc.xgmi.ras_if = NULL;
+   }
+
+   return r;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
index fbcee31..9023789 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
@@ -42,6 +42,7 @@ void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);  
int amdgpu_xgmi_set_pstate(struct amdgpu_device *adev, int pstate);  int 
amdgpu_xgmi_get_hops_count(struct amdgpu_device *adev,
struct amdgpu_device *peer_adev);
+int amdgpu_xgmi_ras_late_init(struct amdgpu_device *adev);
 
 static inline bool amdgpu_xgmi_same_hive(struct amdgpu_device *adev,
struct amdgpu_device *bo_adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index beb6c84..05a9a8a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -51,6 +51,7 @@
 #include "ivsrcid/vmc/irqsrcs_vmc_1_0.h"
 
 #include "amdgpu_ras.h"
+#include "amdgpu_xgmi.h"
 
 /* add these here since we already include dce12 headers and these are for DCN 
*/
 #define mmHUBP0_DCSURF_PRI_VIEWPORT_DIMENSION  
0x055

RE: [PATCH 1/2] drm/amdgpu: initialize ras structures for xgmi block

2019-09-09 Thread Chen, Guchun
That's fine.

Regards,
Guchun

-Original Message-
From: Zhang, Hawking  
Sent: Tuesday, September 10, 2019 11:20 AM
To: Chen, Guchun ; amd-gfx@lists.freedesktop.org; Zhou1, 
Tao ; Li, Dennis ; Deucher, Alexander 

Subject: RE: [PATCH 1/2] drm/amdgpu: initialize ras structures for xgmi block

I'd like to keep the conditional check in amdgpu_xgmi_ras_late_init internal so 
that it can be used anywhere without any conditional check from external. 
Please check v2.

Regards,
Hawking

-Original Message-
From: Chen, Guchun  
Sent: 2019年9月9日 9:22
To: Zhang, Hawking ; amd-gfx@lists.freedesktop.org; 
Zhou1, Tao ; Li, Dennis ; Deucher, 
Alexander 
Subject: RE: [PATCH 1/2] drm/amdgpu: initialize ras structures for xgmi block



-Original Message-
From: Zhang, Hawking  
Sent: Monday, September 9, 2019 6:07 AM
To: amd-gfx@lists.freedesktop.org; Zhou1, Tao ; Chen, Guchun 
; Li, Dennis ; Deucher, Alexander 

Cc: Zhang, Hawking 
Subject: [PATCH 1/2] drm/amdgpu: initialize ras structures for xgmi block

init ras common interface and fs node for xgmi block

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h  |  1 +  
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 36   
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h |  1 +
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  7 +++
 4 files changed, 45 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index 331ce50..f09bd30 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -120,6 +120,7 @@ struct amdgpu_xgmi {
/* gpu list in the same hive */
struct list_head head;
bool supported;
+   struct ras_common_if *ras_if;
 };
 
 struct amdgpu_gmc {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 65aae75..7f6f2e9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -25,6 +25,7 @@
 #include "amdgpu.h"
 #include "amdgpu_xgmi.h"
 #include "amdgpu_smu.h"
+#include "amdgpu_ras.h"
 #include "df/df_3_6_offset.h"
 
 static DEFINE_MUTEX(xgmi_mutex);
@@ -437,3 +438,38 @@ void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
mutex_unlock(&hive->hive_lock);
}
 }
+
+int amdgpu_xgmi_ras_late_init(struct amdgpu_device *adev) {
+   int r;
+   struct ras_ih_if ih_info = {
+   .cb = NULL,
+   };
+   struct ras_fs_if fs_info = {
+   .sysfs_name = "xgmi_wafl_err_count",
+   .debugfs_name = "xgmi_wafl_err_inject",
+   };
+
+   if (!adev->gmc.xgmi.supported ||
+   adev->gmc.xgmi.num_physical_nodes == 0)
+   return 0;
[Guchun]Looks this check num_physical_nodes==0 is redundant, as there is 
already one check outside of this function.
Is it better to move these two conditions outside of this function, to allow 
function entrance once xgmi.supported is true and num_physical_nodes > 0?
if (adev->gmc.xgmi.num_physical_nodes > 1) {
r = amdgpu_xgmi_ras_late_init(adev);
if (r)
return r;
}

+   if (!adev->gmc.xgmi.ras_if) {
+   adev->gmc.xgmi.ras_if = kmalloc(sizeof(struct ras_common_if), 
GFP_KERNEL);
+   if (!adev->gmc.xgmi.ras_if)
+   return -ENOMEM;
+   adev->gmc.xgmi.ras_if->block = AMDGPU_RAS_BLOCK__XGMI_WAFL;
+   adev->gmc.xgmi.ras_if->type = 
AMDGPU_RAS_ERROR__MULTI_UNCORRECTABLE;
+   adev->gmc.xgmi.ras_if->sub_block_index = 0;
+   strcpy(adev->gmc.xgmi.ras_if->name, "xgmi_wafl");
+   }
+   ih_info.head = fs_info.head = *adev->gmc.xgmi.ras_if;
+   r = amdgpu_ras_late_init(adev, adev->gmc.xgmi.ras_if,
+&fs_info, &ih_info);
+   if (r || !amdgpu_ras_is_supported(adev, adev->gmc.xgmi.ras_if->block)) {
+   kfree(adev->gmc.xgmi.ras_if);
+   adev->gmc.xgmi.ras_if = NULL;
+   }
+
+   return r;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
index fbcee31..9023789 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
@@ -42,6 +42,7 @@ void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);  
int amdgpu_xgmi_set_pstate(struct amdgpu_device *adev, int pstate);  int 
amdgpu_xgmi_get_hops_count(struct amdgpu_device *adev,
struct amdgpu_device *peer_adev);
+int amdgpu_xgmi_ras_late_init(struct amdgpu_device *adev);
 
 static inline bool amdgpu_xgmi_same_hive(struct amdgpu_device *adev,
struct amdgpu_device *bo_adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index beb6c84..05a9a8a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -51,6

[PATCH 2/2] drm/amdgpu: enable error injection to XGMI block via debugfs

2019-09-09 Thread Zhang, Hawking
allow inject error to XGMI block via debugfs node ras_ctrl

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 119bedc..d018148 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -679,6 +679,7 @@ int amdgpu_ras_error_inject(struct amdgpu_device *adev,
break;
case AMDGPU_RAS_BLOCK__UMC:
case AMDGPU_RAS_BLOCK__MMHUB:
+   case AMDGPU_RAS_BLOCK__XGMI_WAFL:
ret = psp_ras_trigger_error(&adev->psp, &block_info);
break;
default:
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/2] drm/amdgpu: initialize ras structures for xgmi block (v2)

2019-09-09 Thread Zhang, Hawking
init ras common interface and fs node for xgmi block

v2: remove unnecesary physical node number check before
invoking amdgpu_xgmi_ras_late_init

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h  |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 36 
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h |  1 +
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  4 +++-
 4 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index 232a8ff..8c8547c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -123,6 +123,7 @@ struct amdgpu_xgmi {
/* gpu list in the same hive */
struct list_head head;
bool supported;
+   struct ras_common_if *ras_if;
 };
 
 struct amdgpu_gmc {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 65aae75..7f6f2e9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -25,6 +25,7 @@
 #include "amdgpu.h"
 #include "amdgpu_xgmi.h"
 #include "amdgpu_smu.h"
+#include "amdgpu_ras.h"
 #include "df/df_3_6_offset.h"
 
 static DEFINE_MUTEX(xgmi_mutex);
@@ -437,3 +438,38 @@ void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
mutex_unlock(&hive->hive_lock);
}
 }
+
+int amdgpu_xgmi_ras_late_init(struct amdgpu_device *adev)
+{
+   int r;
+   struct ras_ih_if ih_info = {
+   .cb = NULL,
+   };
+   struct ras_fs_if fs_info = {
+   .sysfs_name = "xgmi_wafl_err_count",
+   .debugfs_name = "xgmi_wafl_err_inject",
+   };
+
+   if (!adev->gmc.xgmi.supported ||
+   adev->gmc.xgmi.num_physical_nodes == 0)
+   return 0;
+
+   if (!adev->gmc.xgmi.ras_if) {
+   adev->gmc.xgmi.ras_if = kmalloc(sizeof(struct ras_common_if), 
GFP_KERNEL);
+   if (!adev->gmc.xgmi.ras_if)
+   return -ENOMEM;
+   adev->gmc.xgmi.ras_if->block = AMDGPU_RAS_BLOCK__XGMI_WAFL;
+   adev->gmc.xgmi.ras_if->type = 
AMDGPU_RAS_ERROR__MULTI_UNCORRECTABLE;
+   adev->gmc.xgmi.ras_if->sub_block_index = 0;
+   strcpy(adev->gmc.xgmi.ras_if->name, "xgmi_wafl");
+   }
+   ih_info.head = fs_info.head = *adev->gmc.xgmi.ras_if;
+   r = amdgpu_ras_late_init(adev, adev->gmc.xgmi.ras_if,
+&fs_info, &ih_info);
+   if (r || !amdgpu_ras_is_supported(adev, adev->gmc.xgmi.ras_if->block)) {
+   kfree(adev->gmc.xgmi.ras_if);
+   adev->gmc.xgmi.ras_if = NULL;
+   }
+
+   return r;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
index fbcee31..9023789 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
@@ -42,6 +42,7 @@ void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);
 int amdgpu_xgmi_set_pstate(struct amdgpu_device *adev, int pstate);
 int amdgpu_xgmi_get_hops_count(struct amdgpu_device *adev,
struct amdgpu_device *peer_adev);
+int amdgpu_xgmi_ras_late_init(struct amdgpu_device *adev);
 
 static inline bool amdgpu_xgmi_same_hive(struct amdgpu_device *adev,
struct amdgpu_device *bo_adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 0d06c79..4b10692 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -51,6 +51,7 @@
 #include "ivsrcid/vmc/irqsrcs_vmc_1_0.h"
 
 #include "amdgpu_ras.h"
+#include "amdgpu_xgmi.h"
 
 /* add these here since we already include dce12 headers and these are for DCN 
*/
 #define mmHUBP0_DCSURF_PRI_VIEWPORT_DIMENSION  
0x055d
@@ -802,7 +803,8 @@ static int gmc_v9_0_ecc_late_init(void *handle)
if (r)
return r;
}
-   return 0;
+
+   return amdgpu_xgmi_ras_late_init(adev);
 }
 
 static int gmc_v9_0_late_init(void *handle)
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 2/2] drm/amdgpu: Allow to reset to EERPOM table.

2019-09-09 Thread Zhou1, Tao


> -Original Message-
> From: Andrey Grodzovsky 
> Sent: 2019年9月10日 4:04
> To: amd-gfx@lists.freedesktop.org
> Cc: Chen, Guchun ; Zhou1, Tao
> ; Deucher, Alexander
> ; Grodzovsky, Andrey
> 
> Subject: [PATCH 2/2] drm/amdgpu: Allow to reset to EERPOM table.
> 
> The table grows quickly during debug/development effort when multiple RAS
> errors are injected. Allow to avoid this by setting table header back to empty
> if needed.
> 
> v2: Switch to debugfs entry instead of load time parameter.
> 
> Signed-off-by: Andrey Grodzovsky 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 119bedc..52c5c61 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -303,6 +303,17 @@ static ssize_t amdgpu_ras_debugfs_ctrl_write(struct
> file *f, const char __user *
>   return size;
>  }
> 
> +static ssize_t amdgpu_ras_debugfs_eeprom_write(struct file *f, const char
> __user *buf,
> + size_t size, loff_t *pos)
> +{
> + struct amdgpu_device *adev = (struct amdgpu_device *)file_inode(f)-
> >i_private;
> + int ret;
> +
> + ret =
> +amdgpu_ras_eeprom_reset_table(&adev->psp.ras.ras->eeprom_control);

[Tao] It's better to add table key in front of the function, with this fixed, 
the series is:

Reviewed-by: Tao Zhou 

> +
> + return ret == 1 ? size : -EIO;
> +}
> +
>  static const struct file_operations amdgpu_ras_debugfs_ctrl_ops = {
>   .owner = THIS_MODULE,
>   .read = NULL,
> @@ -310,6 +321,13 @@ static const struct file_operations
> amdgpu_ras_debugfs_ctrl_ops = {
>   .llseek = default_llseek
>  };
> 
> +static const struct file_operations amdgpu_ras_debugfs_eeprom_ops = {
> + .owner = THIS_MODULE,
> + .read = NULL,
> + .write = amdgpu_ras_debugfs_eeprom_write,
> + .llseek = default_llseek
> +};
> +
>  static ssize_t amdgpu_ras_sysfs_read(struct device *dev,
>   struct device_attribute *attr, char *buf)  { @@ -951,6 +969,8
> @@ static void amdgpu_ras_debugfs_create_ctrl_node(struct
> amdgpu_device *adev)
>   con->dir = debugfs_create_dir("ras", minor->debugfs_root);
>   con->ent = debugfs_create_file("ras_ctrl", S_IWUGO | S_IRUGO, con-
> >dir,
>  adev, &amdgpu_ras_debugfs_ctrl_ops);
> + con->ent = debugfs_create_file("ras_eeprom_reset", S_IWUGO |
> S_IRUGO, con->dir,
> +adev,
> &amdgpu_ras_debugfs_eeprom_ops);
>  }
> 
>  void amdgpu_ras_debugfs_create(struct amdgpu_device *adev,
> --
> 2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 1/2] drm/amdgpu: initialize ras structures for xgmi block (v2)

2019-09-09 Thread Chen, Guchun


-Original Message-
From: Zhang, Hawking  
Sent: Tuesday, September 10, 2019 11:24 AM
To: amd-gfx@lists.freedesktop.org; Chen, Guchun ; Zhou1, 
Tao ; Li, Dennis ; Deucher, Alexander 

Cc: Zhang, Hawking 
Subject: [PATCH 1/2] drm/amdgpu: initialize ras structures for xgmi block (v2)

init ras common interface and fs node for xgmi block

v2: remove unnecesary physical node number check before invoking 
amdgpu_xgmi_ras_late_init
[Guchun]A typo, s/unnecesary/unnecessary. With that fixed, series is: 
Reviewed-by: Guchun Chen 

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h  |  1 +  
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 36   
drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h |  1 +
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c|  4 +++-
 4 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index 232a8ff..8c8547c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -123,6 +123,7 @@ struct amdgpu_xgmi {
/* gpu list in the same hive */
struct list_head head;
bool supported;
+   struct ras_common_if *ras_if;
 };
 
 struct amdgpu_gmc {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 65aae75..7f6f2e9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -25,6 +25,7 @@
 #include "amdgpu.h"
 #include "amdgpu_xgmi.h"
 #include "amdgpu_smu.h"
+#include "amdgpu_ras.h"
 #include "df/df_3_6_offset.h"
 
 static DEFINE_MUTEX(xgmi_mutex);
@@ -437,3 +438,38 @@ void amdgpu_xgmi_remove_device(struct amdgpu_device *adev)
mutex_unlock(&hive->hive_lock);
}
 }
+
+int amdgpu_xgmi_ras_late_init(struct amdgpu_device *adev) {
+   int r;
+   struct ras_ih_if ih_info = {
+   .cb = NULL,
+   };
+   struct ras_fs_if fs_info = {
+   .sysfs_name = "xgmi_wafl_err_count",
+   .debugfs_name = "xgmi_wafl_err_inject",
+   };
+
+   if (!adev->gmc.xgmi.supported ||
+   adev->gmc.xgmi.num_physical_nodes == 0)
+   return 0;
+
+   if (!adev->gmc.xgmi.ras_if) {
+   adev->gmc.xgmi.ras_if = kmalloc(sizeof(struct ras_common_if), 
GFP_KERNEL);
+   if (!adev->gmc.xgmi.ras_if)
+   return -ENOMEM;
+   adev->gmc.xgmi.ras_if->block = AMDGPU_RAS_BLOCK__XGMI_WAFL;
+   adev->gmc.xgmi.ras_if->type = 
AMDGPU_RAS_ERROR__MULTI_UNCORRECTABLE;
+   adev->gmc.xgmi.ras_if->sub_block_index = 0;
+   strcpy(adev->gmc.xgmi.ras_if->name, "xgmi_wafl");
+   }
+   ih_info.head = fs_info.head = *adev->gmc.xgmi.ras_if;
+   r = amdgpu_ras_late_init(adev, adev->gmc.xgmi.ras_if,
+&fs_info, &ih_info);
+   if (r || !amdgpu_ras_is_supported(adev, adev->gmc.xgmi.ras_if->block)) {
+   kfree(adev->gmc.xgmi.ras_if);
+   adev->gmc.xgmi.ras_if = NULL;
+   }
+
+   return r;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
index fbcee31..9023789 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h
@@ -42,6 +42,7 @@ void amdgpu_xgmi_remove_device(struct amdgpu_device *adev);  
int amdgpu_xgmi_set_pstate(struct amdgpu_device *adev, int pstate);  int 
amdgpu_xgmi_get_hops_count(struct amdgpu_device *adev,
struct amdgpu_device *peer_adev);
+int amdgpu_xgmi_ras_late_init(struct amdgpu_device *adev);
 
 static inline bool amdgpu_xgmi_same_hive(struct amdgpu_device *adev,
struct amdgpu_device *bo_adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 0d06c79..4b10692 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -51,6 +51,7 @@
 #include "ivsrcid/vmc/irqsrcs_vmc_1_0.h"
 
 #include "amdgpu_ras.h"
+#include "amdgpu_xgmi.h"
 
 /* add these here since we already include dce12 headers and these are for DCN 
*/
 #define mmHUBP0_DCSURF_PRI_VIEWPORT_DIMENSION  
0x055d
@@ -802,7 +803,8 @@ static int gmc_v9_0_ecc_late_init(void *handle)
if (r)
return r;
}
-   return 0;
+
+   return amdgpu_xgmi_ras_late_init(adev);
 }
 
 static int gmc_v9_0_late_init(void *handle)
--
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[Patch] drm/amdgpu: add navi14 PCI ID for WKS SKU Pro-XLM

2019-09-09 Thread Yin, Tianci (Rico)
Hi,

NV14 add a new workstation SKU DID, please help review.

http://ontrack-internal.amd.com/browse/SWDEV-202589

Thanks!

Rico
From 418e6a02650b5e8d89d91b3dcde3d50567133260 Mon Sep 17 00:00:00 2001
From: "Tianci.Yin" 
Date: Tue, 10 Sep 2019 13:24:05 +0800
Subject: [PATCH] drm/amdgpu: add navi14 PCI ID for WKS SKU Pro-XLM

add navi14 PCI ID for work station SKU Pro-XLM

Change-Id: I2883fc55a03a598a2b3f89a5c1fa440d0c553ded
Signed-off-by: Tianci.Yin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 6978d17..b45a5dc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1025,6 +1025,7 @@ static const struct pci_device_id pciidlist[] = {
 	{0x1002, 0x7340, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_NAVI14},
 	{0x1002, 0x7341, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_NAVI14},
 	{0x1002, 0x7347, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_NAVI14},
+	{0x1002, 0x734F, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_NAVI14},
 
 	/* Renoir */
 	{0x1002, 0x1636, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RENOIR|AMD_IS_APU},
-- 
2.7.4



RE_ Navi14 PROXL DID.eml
Description: RE_ Navi14 PROXL DID.eml
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Recall: [Patch] drm/amdgpu: add navi14 PCI ID for WKS SKU Pro-XLM

2019-09-09 Thread Yin, Tianci (Rico)
Yin, Tianci (Rico) would like to recall the message, "[Patch] drm/amdgpu: add 
navi14 PCI ID for WKS SKU Pro-XLM".
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [Patch] drm/amdgpu: add navi14 PCI ID for WKS SKU Pro-XLM

2019-09-09 Thread Yin, Tianci (Rico)
Ok, Sorry.

From: Yin, Tianci (Rico)
Sent: Tuesday, September 10, 2019 13:37
To: amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander ; Xu, Feifei 
; Xiao, Jack ; Zhang, Hawking 
; Long, Gang 
Subject: [Patch] drm/amdgpu: add navi14 PCI ID for WKS SKU Pro-XLM

Hi,

NV14 add a new workstation SKU DID, please help review.

http://ontrack-internal.amd.com/browse/SWDEV-202589

Thanks!

Rico
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[Patch] drm/amdgpu: add navi14 PCI ID for WKS SKU Pro-XLM

2019-09-09 Thread Yin, Tianci (Rico)

From 418e6a02650b5e8d89d91b3dcde3d50567133260 Mon Sep 17 00:00:00 2001
From: "Tianci.Yin" 
Date: Tue, 10 Sep 2019 13:24:05 +0800
Subject: [PATCH] drm/amdgpu: add navi14 PCI ID for WKS SKU Pro-XLM

add navi14 PCI ID for work station SKU Pro-XLM

Change-Id: I2883fc55a03a598a2b3f89a5c1fa440d0c553ded
Signed-off-by: Tianci.Yin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 6978d17..b45a5dc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1025,6 +1025,7 @@ static const struct pci_device_id pciidlist[] = {
 	{0x1002, 0x7340, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_NAVI14},
 	{0x1002, 0x7341, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_NAVI14},
 	{0x1002, 0x7347, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_NAVI14},
+	{0x1002, 0x734F, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_NAVI14},
 
 	/* Renoir */
 	{0x1002, 0x1636, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RENOIR|AMD_IS_APU},
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Recall: [Patch] drm/amdgpu: add navi14 PCI ID for WKS SKU Pro-XLM

2019-09-09 Thread Yin, Tianci (Rico)
Yin, Tianci (Rico) would like to recall the message, "[Patch] drm/amdgpu: add 
navi14 PCI ID for WKS SKU Pro-XLM".
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 3/3] drm/amdgpu: rename umc ras_init to err_cnt_init

2019-09-09 Thread Zhou1, Tao
this interface is related to specific version of umc, distinguish it
from ras_late_init

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 2 +-
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c   | 8 
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
index 5683c51710aa..c5d8b08af731 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
@@ -63,8 +63,8 @@ int amdgpu_umc_ras_late_init(struct amdgpu_device *adev, void 
*ras_ih_info)
}
 
/* ras init of specific umc version */
-   if (adev->umc.funcs && adev->umc.funcs->ras_init)
-   adev->umc.funcs->ras_init(adev);
+   if (adev->umc.funcs && adev->umc.funcs->err_cnt_init)
+   adev->umc.funcs->err_cnt_init(adev);
 
return 0;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
index 6f22c9704555..3ec36d9e012a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
@@ -54,7 +54,7 @@
adev->umc.funcs->disable_umc_index_mode(adev);
 
 struct amdgpu_umc_funcs {
-   void (*ras_init)(struct amdgpu_device *adev);
+   void (*err_cnt_init)(struct amdgpu_device *adev);
int (*ras_late_init)(struct amdgpu_device *adev, void *ras_ih_info);
void (*query_ras_error_count)(struct amdgpu_device *adev,
void *ras_error_status);
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c 
b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
index 4cdb5c04cd17..1c0da32c1561 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
@@ -234,7 +234,7 @@ static void umc_v6_1_query_ras_error_address(struct 
amdgpu_device *adev,
amdgpu_umc_for_each_channel(umc_v6_1_query_error_address);
 }
 
-static void umc_v6_1_ras_init_per_channel(struct amdgpu_device *adev,
+static void umc_v6_1_err_cnt_init_per_channel(struct amdgpu_device *adev,
 struct ras_err_data *err_data,
 uint32_t umc_reg_offset, uint32_t 
channel_index)
 {
@@ -264,15 +264,15 @@ static void umc_v6_1_ras_init_per_channel(struct 
amdgpu_device *adev,
WREG32(ecc_err_cnt_addr + umc_reg_offset, UMC_V6_1_CE_CNT_INIT);
 }
 
-static void umc_v6_1_ras_init(struct amdgpu_device *adev)
+static void umc_v6_1_err_cnt_init(struct amdgpu_device *adev)
 {
void *ras_error_status = NULL;
 
-   amdgpu_umc_for_each_channel(umc_v6_1_ras_init_per_channel);
+   amdgpu_umc_for_each_channel(umc_v6_1_err_cnt_init_per_channel);
 }
 
 const struct amdgpu_umc_funcs umc_v6_1_funcs = {
-   .ras_init = umc_v6_1_ras_init,
+   .err_cnt_init = umc_v6_1_err_cnt_init,
.ras_late_init = amdgpu_umc_ras_late_init,
.query_ras_error_count = umc_v6_1_query_ras_error_count,
.query_ras_error_address = umc_v6_1_query_ras_error_address,
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/3] drm/amdgpu: move umc late init from gmc to umc block

2019-09-09 Thread Zhou1, Tao
umc late init is umc specific, it's more suitable to be put in umc block

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/Makefile |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 48 
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h |  2 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 73 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h |  2 +
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   |  8 ++-
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c   |  1 +
 7 files changed, 82 insertions(+), 54 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 84614a71bb4d..91369c823ce2 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -55,7 +55,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
amdgpu_vf_error.o amdgpu_sched.o amdgpu_debugfs.o amdgpu_ids.o \
amdgpu_gmc.o amdgpu_mmhub.o amdgpu_xgmi.o amdgpu_csa.o amdgpu_ras.o 
amdgpu_vm_cpu.o \
amdgpu_vm_sdma.o amdgpu_pmu.o amdgpu_discovery.o amdgpu_ras_eeprom.o 
amdgpu_nbio.o \
-   smu_v11_0_i2c.o
+   amdgpu_umc.o smu_v11_0_i2c.o
 
 amdgpu-$(CONFIG_PERF_EVENTS) += amdgpu_pmu.o
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
index 51890b1d8522..dc044eec188e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
@@ -304,51 +304,3 @@ bool amdgpu_gmc_filter_faults(struct amdgpu_device *adev, 
uint64_t addr,
gmc->fault_hash[hash].idx = gmc->last_fault++;
return false;
 }
-
-int amdgpu_gmc_ras_late_init(struct amdgpu_device *adev,
-void *ras_ih_info)
-{
-   int r;
-   struct ras_ih_if *ih_info = (struct ras_ih_if *)ras_ih_info;
-   struct ras_fs_if fs_info = {
-   .sysfs_name = "umc_err_count",
-   .debugfs_name = "umc_err_inject",
-   };
-
-   if (!ih_info)
-   return -EINVAL;
-
-   if (!adev->gmc.umc_ras_if) {
-   adev->gmc.umc_ras_if = kmalloc(sizeof(struct ras_common_if), 
GFP_KERNEL);
-   if (!adev->gmc.umc_ras_if)
-   return -ENOMEM;
-   adev->gmc.umc_ras_if->block = AMDGPU_RAS_BLOCK__UMC;
-   adev->gmc.umc_ras_if->type = 
AMDGPU_RAS_ERROR__MULTI_UNCORRECTABLE;
-   adev->gmc.umc_ras_if->sub_block_index = 0;
-   strcpy(adev->gmc.umc_ras_if->name, "umc");
-   }
-   ih_info->head = fs_info.head = *adev->gmc.umc_ras_if;
-
-   r = amdgpu_ras_late_init(adev, adev->gmc.umc_ras_if,
-&fs_info, ih_info);
-   if (r)
-   goto free;
-
-   if (amdgpu_ras_is_supported(adev, adev->gmc.umc_ras_if->block)) {
-   r = amdgpu_irq_get(adev, &adev->gmc.ecc_irq, 0);
-   if (r)
-   goto late_fini;
-   } else {
-   r = 0;
-   goto free;
-   }
-
-   return 0;
-
-late_fini:
-   amdgpu_ras_late_fini(adev, adev->gmc.umc_ras_if, ih_info);
-free:
-   kfree(adev->gmc.umc_ras_if);
-   adev->gmc.umc_ras_if = NULL;
-   return r;
-}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
index 232a8ff5642b..d3be51ba6349 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h
@@ -234,7 +234,5 @@ void amdgpu_gmc_agp_location(struct amdgpu_device *adev,
 struct amdgpu_gmc *mc);
 bool amdgpu_gmc_filter_faults(struct amdgpu_device *adev, uint64_t addr,
  uint16_t pasid, uint64_t timestamp);
-int amdgpu_gmc_ras_late_init(struct amdgpu_device *adev,
-void *ih_info);
 
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
new file mode 100644
index ..c8de127097ab
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
@@ -0,0 +1,73 @@
+/*
+ * Copyright 2019 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AU

[PATCH 2/3] drm/amdgpu: move umc ras init to umc block

2019-09-09 Thread Zhou1, Tao
move umc ras init from ras module to umc block, generic ras module
should pay less attention to specific ras block.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 119bedc9802a..a9aba06c9452 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1653,10 +1653,6 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
if (amdgpu_ras_fs_init(adev))
goto fs_out;
 
-   /* ras init for each ras block */
-   if (adev->umc.funcs->ras_init)
-   adev->umc.funcs->ras_init(adev);
-
DRM_INFO("RAS INFO: ras initialized successfully, "
"hardware ability[%x] ras_mask[%x]\n",
con->hw_supported, con->supported);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
index c8de127097ab..5683c51710aa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
@@ -62,6 +62,10 @@ int amdgpu_umc_ras_late_init(struct amdgpu_device *adev, 
void *ras_ih_info)
goto free;
}
 
+   /* ras init of specific umc version */
+   if (adev->umc.funcs && adev->umc.funcs->ras_init)
+   adev->umc.funcs->ras_init(adev);
+
return 0;
 
 late_fini:
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 3/3] drm/amdgpu: rename umc ras_init to err_cnt_init

2019-09-09 Thread Chen, Guchun
Series is: Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Zhou1, Tao  
Sent: Tuesday, September 10, 2019 2:31 PM
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking ; 
Chen, Guchun 
Cc: Zhou1, Tao 
Subject: [PATCH 3/3] drm/amdgpu: rename umc ras_init to err_cnt_init

this interface is related to specific version of umc, distinguish it from 
ras_late_init

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 4 ++--  
drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h | 2 +-
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c   | 8 
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
index 5683c51710aa..c5d8b08af731 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
@@ -63,8 +63,8 @@ int amdgpu_umc_ras_late_init(struct amdgpu_device *adev, void 
*ras_ih_info)
}
 
/* ras init of specific umc version */
-   if (adev->umc.funcs && adev->umc.funcs->ras_init)
-   adev->umc.funcs->ras_init(adev);
+   if (adev->umc.funcs && adev->umc.funcs->err_cnt_init)
+   adev->umc.funcs->err_cnt_init(adev);
 
return 0;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
index 6f22c9704555..3ec36d9e012a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.h
@@ -54,7 +54,7 @@
adev->umc.funcs->disable_umc_index_mode(adev);
 
 struct amdgpu_umc_funcs {
-   void (*ras_init)(struct amdgpu_device *adev);
+   void (*err_cnt_init)(struct amdgpu_device *adev);
int (*ras_late_init)(struct amdgpu_device *adev, void *ras_ih_info);
void (*query_ras_error_count)(struct amdgpu_device *adev,
void *ras_error_status);
diff --git a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c 
b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
index 4cdb5c04cd17..1c0da32c1561 100644
--- a/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/umc_v6_1.c
@@ -234,7 +234,7 @@ static void umc_v6_1_query_ras_error_address(struct 
amdgpu_device *adev,
amdgpu_umc_for_each_channel(umc_v6_1_query_error_address);
 }
 
-static void umc_v6_1_ras_init_per_channel(struct amdgpu_device *adev,
+static void umc_v6_1_err_cnt_init_per_channel(struct amdgpu_device 
+*adev,
 struct ras_err_data *err_data,
 uint32_t umc_reg_offset, uint32_t 
channel_index)  { @@ -264,15 +264,15 @@ static void 
umc_v6_1_ras_init_per_channel(struct amdgpu_device *adev,
WREG32(ecc_err_cnt_addr + umc_reg_offset, UMC_V6_1_CE_CNT_INIT);  }
 
-static void umc_v6_1_ras_init(struct amdgpu_device *adev)
+static void umc_v6_1_err_cnt_init(struct amdgpu_device *adev)
 {
void *ras_error_status = NULL;
 
-   amdgpu_umc_for_each_channel(umc_v6_1_ras_init_per_channel);
+   amdgpu_umc_for_each_channel(umc_v6_1_err_cnt_init_per_channel);
 }
 
 const struct amdgpu_umc_funcs umc_v6_1_funcs = {
-   .ras_init = umc_v6_1_ras_init,
+   .err_cnt_init = umc_v6_1_err_cnt_init,
.ras_late_init = amdgpu_umc_ras_late_init,
.query_ras_error_count = umc_v6_1_query_ras_error_count,
.query_ras_error_address = umc_v6_1_query_ras_error_address,
--
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx