Re: Fw: [PATCH] drm/amdgpu: add amdgpu_timeout_ring_* file to debugfs

2023-06-15 Thread Nicolai Hähnle
On Thu, Jun 15, 2023 at 9:47 AM Christian König
 wrote:
> >> Uh, that's very dangerous what you do here and wouldn't work in a whole
> >> bunch of cases.
> > Please elaborate: *what* case doesn't work?
>
> The core memory management can wait at any time for the GPU reset to finish.
>
> If we set the timeout to infinity we risk just deadlocking the kernel.

Okay, thanks. I may have seen some aspect of this before in cases
where GPU reset failed and left processes in an unkillable state.

I'll be honest, I've seen my fair share of exotic GPU hangs and to put
it mildly I'm not impressed by the kernel's handling of them.
Obviously you know much more about the intricacies of kernel memory
management than I do, but the fact that processes can end up in an
unkillable state for *any* GPU-related reason feels to me like the
result of a bad design decision somewhere.

But anyway, I'm not even asking you to fix those problems. All I'm
asking you is to let me do *my* job, part of which is to help prevent
GPU hangs from happening in the first place. For that, I need useful
debugging facilities -- and so do others.


> > Again, being able to disable GPU recovery is a crucial debugging
> > feature. We need to be able to inspect the live state of hung shaders,
> > and we need to be able to single-step through shaders. All of that
> > requires disabling GPU recovery.
>
> Yeah, I'm perfectly aware of that. The problem is this is just *not*
> supported on Linux for graphics shaders.
>
> What you can do is to run the shader with something like CWSR enabled
> (or an to be invented graphics equivalent). Since we are debugging the
> shader anyway that should be possible I think.
>
> > Forcing people to reboot just to be able to disable GPU recovery for
> > debugging is developer hostile.
>
> Well, I think you misunderstood me. The suggestion was even to force
> them to re-compile the kernel driver to disable GPU recovery.
>
> Disabling GPU recovery is *not* something you can do and expect the
> system to be stable.
>
> The only case we can do that is when we attach a JTAG debugger in an AMD
> lab.

You're being *completely* unreasonable here. Even Windows(!) allows
disabling GPU recovery at runtime from software, and Windows is
usually far more developer hostile than Linux in these things.
Seriously, this level of hostility against developers coming from you
is not okay.

Yes, it's a tool that has sharp edges. That is perfectly well
understood. If we need to add warning labels then so be it. And if the
details of *how* to change the timeout or disable GPU recovery at
runtime should be changed, that too is fine. But it's an important
tool. Can we please just move forward on this in a pragmatic fashion?

Thanks,
Nicolai


>
> Regards,
> Christian.
>
> >
> > So again, if there really are cases where this "doesn't work" (and
> > those cases aren't just that your desktop will freeze -- that part is
> > intentional), then let's talk through it and see how to address them.
> >
> > Thanks,
> > Nicolai
> >
> >
> >> Regards,
> >> Christian.
> >>
> >>> Signed-off-by: Nicolai Hähnle 
> >>> ---
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 32 +++-
> >>>1 file changed, 31 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> >>> index dc474b809604..32d223daa789 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> >>> @@ -471,35 +471,65 @@ static ssize_t amdgpu_debugfs_ring_read(struct file 
> >>> *f, char __user *buf,
> >>>
> >>> return result;
> >>>}
> >>>
> >>>static const struct file_operations amdgpu_debugfs_ring_fops = {
> >>> .owner = THIS_MODULE,
> >>> .read = amdgpu_debugfs_ring_read,
> >>> .llseek = default_llseek
> >>>};
> >>>
> >>> +static int amdgpu_debugfs_timeout_ring_get(void *data, u64 *val) {
> >>> + struct amdgpu_ring *ring = data;
> >>> +
> >>> + if (ring->sched.timeout == MAX_SCHEDULE_TIMEOUT)
> >>> + *val = 0;
> >>> + else
> >>> + *val = jiffies_to_msecs(ring->sched.timeout);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +stati

Re: Fw: [PATCH] drm/amdgpu: add amdgpu_timeout_ring_* file to debugfs

2023-06-14 Thread Nicolai Hähnle
Hi Christian,

> > Report the per-ring timeout in milliseconds and allow users to adjust
> > the timeout dynamically. This can be useful for debugging, e.g. to more
> > easily test whether a submission genuinely hangs or is just taking very
> > long, and to temporarily disable GPU recovery so that shader problems
> > can be examined in detail, including single-stepping through shader
> > code.
> >
> > It feels a bit questionable to access ring->sched.timeout without any
> > locking -- under a C++ memory model it would technically be undefined
> > behavior. But it's not like a lot can go wrong here in practice, and
> > it's not clear to me what locking or atomics, if any, should be used.
>
> Uh, that's very dangerous what you do here and wouldn't work in a whole
> bunch of cases.

Please elaborate: *what* case doesn't work?


> First of all GPU recovery is part of normal operation and necessary for
> system stability. So disabling GPU recovery is actually not a good idea
> in the first place.

That's a complete non-argument because the whole point of this is that
it is a debugging feature. You're using this when the system as a
whole (most likely a UMD component) is already broken in some way.
Putting this in debugfs is not an accident.


> We already discussed that we probably need to taint the kernel if we do
> so to indicate in crash logs that the system is not considered stable
> any more. The problem was only that there wasn't an agreement on how to
> do this.

I'd be happy to add kernel tainting if you tell me how.


> Since this here now makes it even easier to disable GPU recovery it's
> probably not the right approach.

Again, being able to disable GPU recovery is a crucial debugging
feature. We need to be able to inspect the live state of hung shaders,
and we need to be able to single-step through shaders. All of that
requires disabling GPU recovery.

Forcing people to reboot just to be able to disable GPU recovery for
debugging is developer hostile.

So again, if there really are cases where this "doesn't work" (and
those cases aren't just that your desktop will freeze -- that part is
intentional), then let's talk through it and see how to address them.

Thanks,
Nicolai


>
> Regards,
> Christian.
>
> >
> > Signed-off-by: Nicolai Hähnle 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 32 +++-
> >   1 file changed, 31 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > index dc474b809604..32d223daa789 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> > @@ -471,35 +471,65 @@ static ssize_t amdgpu_debugfs_ring_read(struct file 
> > *f, char __user *buf,
> >
> >return result;
> >   }
> >
> >   static const struct file_operations amdgpu_debugfs_ring_fops = {
> >.owner = THIS_MODULE,
> >.read = amdgpu_debugfs_ring_read,
> >.llseek = default_llseek
> >   };
> >
> > +static int amdgpu_debugfs_timeout_ring_get(void *data, u64 *val) {
> > + struct amdgpu_ring *ring = data;
> > +
> > + if (ring->sched.timeout == MAX_SCHEDULE_TIMEOUT)
> > + *val = 0;
> > + else
> > + *val = jiffies_to_msecs(ring->sched.timeout);
> > +
> > + return 0;
> > +}
> > +
> > +static int amdgpu_debugfs_timeout_ring_set(void *data, u64 val) {
> > + struct amdgpu_ring *ring = data;
> > +
> > + if (val == 0)
> > + ring->sched.timeout = MAX_SCHEDULE_TIMEOUT;
> > + else
> > + ring->sched.timeout = msecs_to_jiffies(val);
> > +
> > + return 0;
> > +}
> > +
> > +DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_timeout_ring_fops,
> > +  amdgpu_debugfs_timeout_ring_get,
> > +  amdgpu_debugfs_timeout_ring_set,
> > +  "%llu\n");
> > +
> >   #endif
> >
> >   void amdgpu_debugfs_ring_init(struct amdgpu_device *adev,
> >  struct amdgpu_ring *ring)
> >   {
> >   #if defined(CONFIG_DEBUG_FS)
> >struct drm_minor *minor = adev_to_drm(adev)->primary;
> >struct dentry *root = minor->debugfs_root;
> > - char name[32];
> > + char name[40];
> >
> >sprintf(name, "amdgpu_ring_%s", ring->name);
> >debugfs_create_file_size(name, S_IFREG | S_IRUGO, root, r

[PATCH] drm/amdgpu: add amdgpu_timeout_ring_* file to debugfs

2023-06-14 Thread Nicolai Hähnle
Report the per-ring timeout in milliseconds and allow users to adjust
the timeout dynamically. This can be useful for debugging, e.g. to more
easily test whether a submission genuinely hangs or is just taking very
long, and to temporarily disable GPU recovery so that shader problems
can be examined in detail, including single-stepping through shader
code.

It feels a bit questionable to access ring->sched.timeout without any
locking -- under a C++ memory model it would technically be undefined
behavior. But it's not like a lot can go wrong here in practice, and
it's not clear to me what locking or atomics, if any, should be used.

Signed-off-by: Nicolai Hähnle 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index dc474b809604..32d223daa789 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -471,35 +471,65 @@ static ssize_t amdgpu_debugfs_ring_read(struct file *f, 
char __user *buf,
 
return result;
 }
 
 static const struct file_operations amdgpu_debugfs_ring_fops = {
.owner = THIS_MODULE,
.read = amdgpu_debugfs_ring_read,
.llseek = default_llseek
 };
 
+static int amdgpu_debugfs_timeout_ring_get(void *data, u64 *val) {
+   struct amdgpu_ring *ring = data;
+
+   if (ring->sched.timeout == MAX_SCHEDULE_TIMEOUT)
+   *val = 0;
+   else
+   *val = jiffies_to_msecs(ring->sched.timeout);
+
+   return 0;
+}
+
+static int amdgpu_debugfs_timeout_ring_set(void *data, u64 val) {
+   struct amdgpu_ring *ring = data;
+
+   if (val == 0)
+   ring->sched.timeout = MAX_SCHEDULE_TIMEOUT;
+   else
+   ring->sched.timeout = msecs_to_jiffies(val);
+
+   return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(amdgpu_debugfs_timeout_ring_fops,
+amdgpu_debugfs_timeout_ring_get,
+amdgpu_debugfs_timeout_ring_set,
+"%llu\n");
+
 #endif
 
 void amdgpu_debugfs_ring_init(struct amdgpu_device *adev,
  struct amdgpu_ring *ring)
 {
 #if defined(CONFIG_DEBUG_FS)
struct drm_minor *minor = adev_to_drm(adev)->primary;
struct dentry *root = minor->debugfs_root;
-   char name[32];
+   char name[40];
 
sprintf(name, "amdgpu_ring_%s", ring->name);
debugfs_create_file_size(name, S_IFREG | S_IRUGO, root, ring,
 &amdgpu_debugfs_ring_fops,
 ring->ring_size + 12);
 
+   sprintf(name, "amdgpu_timeout_ring_%s", ring->name);
+   debugfs_create_file(name, S_IFREG | S_IRUGO | S_IWUSR, root, ring,
+   &amdgpu_debugfs_timeout_ring_fops);
 #endif
 }
 
 /**
  * amdgpu_ring_test_helper - tests ring and set sched readiness status
  *
  * @ring: ring to try the recovery on
  *
  * Tests ring and set sched readiness status
  *
-- 
2.40.0



[PATCH umr 11/17] gui/waves_panel: refactor the wave storage and wave identification

2023-06-06 Thread Nicolai Hähnle
Store the waves' and shaders' JSON objects individually in STL structures
instead of keeping everything inside of a giant parent JSON object. This
is a first step towards updating individual waves.

At the same time, identify waves by their HW ID. This makes the collapsed
overview more informative and presumably behaves better if the set of
active waves changes between queries.

Also handle the CU/WGP distinction correctly for gfx10+ and add some
robustness to the active shader display.

Signed-off-by: Nicolai Hähnle 
---
 src/app/gui/waves_panel.cpp | 280 ++--
 1 file changed, 170 insertions(+), 110 deletions(-)

diff --git a/src/app/gui/waves_panel.cpp b/src/app/gui/waves_panel.cpp
index fa4521e..7e13b48 100644
--- a/src/app/gui/waves_panel.cpp
+++ b/src/app/gui/waves_panel.cpp
@@ -18,149 +18,185 @@
  * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
  * USE OR OTHER DEALINGS IN THE SOFTWARE.
  *
  * The above copyright notice and this permission notice (including the
  * next paragraph) shall be included in all copies or substantial portions
  * of the Software.
  */
 #include "panels.h"
 
 #include 
+#include 
 
 class WavesPanel : public Panel {
 public:
WavesPanel(struct umr_asic *asic) : Panel(asic) {
/* SGPR */

shader_syntax.add_definition("(s[[:digit:]]+|s\\[[[:digit:]]+:[[:digit:]]+\\])",
 { "#d33682" });
/* VGPR */

shader_syntax.add_definition("(v[[:digit:]]+|v\\[[[:digit:]]+:[[:digit:]]+\\])",
 { "#6c71c4" });
/* Constants */
shader_syntax.add_definition("(0x[[:digit:]]*)\\b", { "#b58900" 
});
/* Comments */
shader_syntax.add_definition("(;)", { "#586e75" });
/* Keywords */

shader_syntax.add_definition("(attr[[:digit:]]+|exec|m0|[[:alpha:]]+cnt\\([[:digit:]]\\))",
 { "#3097a1" });
}
 
+   ~WavesPanel() {
+   clear_waves_and_shaders();
+   }
+
+   void clear_waves_and_shaders() {
+   for (const auto &wave : waves)
+   
json_value_free(json_object_get_wrapping_value(wave.wave));
+   waves.clear();
+
+   for (const auto &entry : shaders)
+   
json_value_free(json_object_get_wrapping_value(entry.second));
+   shaders.clear();
+   }
+
+   std::string get_wave_id(JSON_Object *wave) {
+   unsigned se = (unsigned int)json_object_get_number(wave, "se");
+   unsigned sh = (unsigned int)json_object_get_number(wave, "sh");
+   unsigned cu = (unsigned int)json_object_get_number(wave, "cu");
+   unsigned wgp = (unsigned int)json_object_get_number(wave, 
"wgp");
+   unsigned simd_id = (unsigned int)json_object_get_number(wave, 
"simd_id");
+   unsigned wave_id = (unsigned int)json_object_get_number(wave, 
"wave_id");
+
+   char id[128];
+   if (asic->family < FAMILY_NV) {
+   snprintf(id, sizeof(id), 
"se%u.sa%u.cu%u.simd%u.wave%u", se, sh, cu, simd_id, wave_id);
+   } else {
+   snprintf(id, sizeof(id), 
"se%u.sa%u.wgp%u.simd%u.wave%u", se, sh, wgp, simd_id, wave_id);
+   }
+
+   return id;
+   }
+
+   size_t find_wave_by_id(const std::string &id) {
+   size_t i;
+   for (i = 0; i < waves.size(); ++i) {
+   if (waves[i].id == id)
+   break;
+   }
+   return i;
+   }
+
+   void update_shaders(JSON_Object *shaders_dict) {
+   int shaders_count = json_object_get_count(shaders_dict);
+   for (int i = 0; i < shaders_count; ++i) {
+   shaders.emplace(json_object_get_name(shaders_dict, i),
+   
json_object(json_value_deep_copy(json_object_get_value_at(shaders_dict, i;
+   }
+   }
+
void process_server_message(JSON_Object *response, void *raw_data, 
unsigned raw_data_size) {
JSON_Value *error = json_object_get_value(response, "error");
if (error)
return;
 
JSON_Object *request = 
json_object(json_object_get_value(response, "request"));
JSON_Value *answer = json_object_get_value(response, "answer");
const char *command = json_object_get_string(request, 
"command");
 
-   if (strcmp(command, "waves"))
-   return;
+   if (strcmp(command, &quo

[PATCH umr 17/17] gui/waves_panel: grey out inactive lanes of VGPRs

2023-06-06 Thread Nicolai Hähnle
Signed-off-by: Nicolai Hähnle 
---
 src/app/gui/waves_panel.cpp | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/src/app/gui/waves_panel.cpp b/src/app/gui/waves_panel.cpp
index 68b06ea..cc37a1b 100644
--- a/src/app/gui/waves_panel.cpp
+++ b/src/app/gui/waves_panel.cpp
@@ -153,26 +153,30 @@ public:
 
ImGui::Separator();
if (!waves.empty()) {
ImGui::BeginChild("Waves", ImVec2(avail.x / 2, 0), 
false, ImGuiWindowFlags_NoTitleBar);
bool force_scroll = false;
for (size_t i = 0; i < waves.size(); ++i) {
JSON_Object *wave = waves[i].wave;
JSON_Object *status = 
json_object(json_object_get_value(wave, "status"));
 
int active_threads = -1;
+   uint64_t exec = 0;
JSON_Array *threads = 
json_object_get_array(wave, "threads");
if (threads) {
active_threads = 0;
int s = json_array_get_count(threads);
for (int i = 0; i < s; i++) {
-   active_threads += 
json_array_get_boolean(threads, i);
+   bool active = 
json_array_get_boolean(threads, i) == 1;
+   active_threads += active ? 1 : 
0;
+   if (active)
+   exec |= (uint64_t)1 << 
i;
}
}
const char *shader_address_str = 
json_object_get_string(wave, "shader");
 
char label[256];
if (active_threads < 0)
sprintf(label, "Wave %s", 
waves[i].id.c_str());
else if (shader_address_str)
sprintf(label, "Wave %s (#dbde79%d 
threads, valid PC)", waves[i].id.c_str(), active_threads);
else
@@ -288,21 +292,22 @@ public:
ImGui::PopID();

ImGui::NextColumn();
}
ImGui::PopID();
ImGui::Columns(1);
ImGui::TreePop();
}
}
 
{
-   static const char *formats[] = 
{ "#6c71c4%d", "#6c71c4%u", "#6c71c4%08x" };
+   static const char 
*formats_active[] = { "#6c71c4%d", "#6c71c4%u", "#6c71c4%08x", "#6c71c4%f" };
+   static const char 
*formats_inactive[] = { "#818181%d", "#818181%u", "#818181%08x", "#818181%f" };
JSON_Array *vgpr = 
json_object_get_array(wave, "vgpr");
if (vgpr && 
ImGui::TreeNodeEx("#6c71c4VGPRs")) {
int s = 
json_array_get_count(vgpr);
 

ImGui::BeginTable("vgprvalues", 5, ImGuiTableFlags_Borders);

ImGui::TableSetupColumn("Base");

ImGui::TableSetupColumn("+ 0");

ImGui::TableSetupColumn("+ 1");

ImGui::TableSetupColumn("+ 2");

ImGui::TableSetupColumn("+ 3");
@@ -330,26 +335,28 @@ public:
int 
num_thread = json_array_get_count(vgp);
 
for 
(int t = 0; t < num_thread; t++) {

if (t % 4 == 0) {

[PATCH umr 14/17] server/waves: pull out a wave_to_json function that deserves the name

2023-06-06 Thread Nicolai Hähnle
We will use this to send updates for an individual wave.

Signed-off-by: Nicolai Hähnle 
---
 src/app/gui/commands.c | 299 +
 1 file changed, 153 insertions(+), 146 deletions(-)

diff --git a/src/app/gui/commands.c b/src/app/gui/commands.c
index fbb11fa..38ae44d 100644
--- a/src/app/gui/commands.c
+++ b/src/app/gui/commands.c
@@ -1537,175 +1537,182 @@ void init_asics() {
}
 
index++;
} else {
fclose(opt.test_log_fd);
free(ip_discovery_dump);
}
}
 }
 
-static void wave_to_json(struct umr_asic *asic, int ring_is_halted, int 
include_shaders, JSON_Object *out) {
-   // TODO: This is using the deprecated API ...
-   struct umr_pm4_stream *stream = NULL; // umr_pm4_decode_ring(asic, 
asic->options.ring_name, 1, -1, -1);
-
-   struct umr_wave_data *wd = umr_scan_wave_data(asic);
-
-   JSON_Value *shaders = json_value_init_object();
+static JSON_Value *wave_to_json(struct umr_asic *asic, struct umr_wave_data 
*wd, int include_shaders,
+   struct umr_pm4_stream *stream, JSON_Value 
*shaders) {
+   uint64_t pgm_addr = (((uint64_t)wd->ws.pc_hi << 32) | wd->ws.pc_lo);
+   unsigned vmid;
+
+   JSON_Value *wave = json_value_init_object();
+   json_object_set_number(json_object(wave), "se", wd->se);
+   json_object_set_number(json_object(wave), "sh", wd->sh);
+   json_object_set_number(json_object(wave), asic->family < FAMILY_NV ? 
"cu" : "wgp", wd->cu);
+   json_object_set_number(json_object(wave), "simd_id", 
wd->ws.hw_id1.simd_id);
+   json_object_set_number(json_object(wave), "wave_id", 
wd->ws.hw_id1.wave_id);
+   json_object_set_number(json_object(wave), "PC", pgm_addr);
+   json_object_set_number(json_object(wave), "wave_inst_dw0", 
wd->ws.wave_inst_dw0);
+   if (asic->family < FAMILY_NV)
+   json_object_set_number(json_object(wave), "wave_inst_dw1", 
wd->ws.wave_inst_dw1);
+
+   JSON_Value *status = json_value_init_object();
+   json_object_set_number(json_object(status), "value", 
wd->ws.wave_status.value);
+   json_object_set_number(json_object(status), "scc", 
wd->ws.wave_status.scc);
+   json_object_set_number(json_object(status), "execz", 
wd->ws.wave_status.execz);
+   json_object_set_number(json_object(status), "vccz", 
wd->ws.wave_status.vccz);
+   json_object_set_number(json_object(status), "in_tg", 
wd->ws.wave_status.in_tg);
+   json_object_set_number(json_object(status), "halt", 
wd->ws.wave_status.halt);
+   json_object_set_number(json_object(status), "valid", 
wd->ws.wave_status.valid);
+   json_object_set_number(json_object(status), "spi_prio", 
wd->ws.wave_status.spi_prio);
+   json_object_set_number(json_object(status), "wave_prio", 
wd->ws.wave_status.wave_prio);
+   json_object_set_number(json_object(status), "priv", 
wd->ws.wave_status.priv);
+   json_object_set_number(json_object(status), "trap_en", 
wd->ws.wave_status.trap_en);
+   json_object_set_number(json_object(status), "trap", 
wd->ws.wave_status.trap);
+   json_object_set_number(json_object(status), "ttrace_en", 
wd->ws.wave_status.ttrace_en);
+   json_object_set_number(json_object(status), "export_rdy", 
wd->ws.wave_status.export_rdy);
+   json_object_set_number(json_object(status), "in_barrier", 
wd->ws.wave_status.in_barrier);
+   json_object_set_number(json_object(status), "ecc_err", 
wd->ws.wave_status.ecc_err);
+   json_object_set_number(json_object(status), "skip_export", 
wd->ws.wave_status.skip_export);
+   json_object_set_number(json_object(status), "perf_en", 
wd->ws.wave_status.perf_en);
+   json_object_set_number(json_object(status), "cond_dbg_user", 
wd->ws.wave_status.cond_dbg_user);
+   json_object_set_number(json_object(status), "cond_dbg_sys", 
wd->ws.wave_status.cond_dbg_sys);
+   json_object_set_number(json_object(status), "allow_replay", 
wd->ws.wave_status.allow_replay);
+   json_object_set_number(json_object(status), "fatal_halt", asic->family 
>= FAMILY_AI && wd->ws.wave_status.fatal_halt);
+   json_object_set_number(json_object(status), "must_export", 
wd->ws.wave_status.must_export);
+
+   json_object_set_value(json_object(wave), "status", status);
+
+   JSON_Value *hw_id = json_value_init_object();
+   if (asic->family < FAMILY_NV) {
+   json_object_set_n

[PATCH umr 13/17] server/waves: fix ring halt logic

2023-06-06 Thread Nicolai Hähnle
Whether GPRs can be read or not only depends on the state of the individual
wave, not on the state of any ring.

Use ring_is_halted only to gate the logic that tries to extract shader
references from PM4 for more convenient disassembly.

Signed-off-by: Nicolai Hähnle 
---
 src/app/gui/commands.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/src/app/gui/commands.c b/src/app/gui/commands.c
index 1e5e854..fbb11fa 100644
--- a/src/app/gui/commands.c
+++ b/src/app/gui/commands.c
@@ -1537,21 +1537,21 @@ void init_asics() {
}
 
index++;
} else {
fclose(opt.test_log_fd);
free(ip_discovery_dump);
}
}
 }
 
-static void wave_to_json(struct umr_asic *asic, int is_halted, int 
include_shaders, JSON_Object *out) {
+static void wave_to_json(struct umr_asic *asic, int ring_is_halted, int 
include_shaders, JSON_Object *out) {
// TODO: This is using the deprecated API ...
struct umr_pm4_stream *stream = NULL; // umr_pm4_decode_ring(asic, 
asic->options.ring_name, 1, -1, -1);
 
struct umr_wave_data *wd = umr_scan_wave_data(asic);
 
JSON_Value *shaders = json_value_init_object();
 
JSON_Value *waves = json_value_init_array();
while (wd) {
uint64_t pgm_addr = (((uint64_t)wd->ws.pc_hi << 32) | 
wd->ws.pc_lo);
@@ -1633,21 +1633,21 @@ static void wave_to_json(struct umr_asic *asic, int 
is_halted, int include_shade
}
json_object_set_value(json_object(wave), "threads", threads);
 
JSON_Value *gpr_alloc = json_value_init_object();
json_object_set_number(json_object(gpr_alloc), "vgpr_base", 
wd->ws.gpr_alloc.vgpr_base);
json_object_set_number(json_object(gpr_alloc), "vgpr_size", 
wd->ws.gpr_alloc.vgpr_size);
json_object_set_number(json_object(gpr_alloc), "sgpr_base", 
wd->ws.gpr_alloc.sgpr_base);
json_object_set_number(json_object(gpr_alloc), "sgpr_size", 
wd->ws.gpr_alloc.sgpr_size);
json_object_set_value(json_object(wave), "gpr_alloc", 
gpr_alloc);
 
-   if (is_halted && wd->ws.gpr_alloc.value != 0xbebebeef) {
+   if (wd->ws.gpr_alloc.value != 0xbebebeef) {
int sgpr_count;
if (asic->family <= FAMILY_AI) {
int shift = asic->family <= FAMILY_CIK ? 3 : 4;
sgpr_count = (wd->ws.gpr_alloc.sgpr_size + 1) 
<< shift;
} else {
sgpr_count = 108; // regular SGPRs and VCC
}
JSON_Value *sgpr = json_value_init_array();
for (int x = 0; x < sgpr_count; x++) {
json_array_append_number(json_array(sgpr), 
wd->sgprs[x]);
@@ -1663,23 +1663,27 @@ static void wave_to_json(struct umr_asic *asic, int 
is_halted, int include_shade
for (int thread = 0; thread < 
num_threads; thread++) {

json_array_append_number(json_array(v), wd->vgprs[thread * 256 + x]);
}

json_array_append_value(json_array(vgpr), v);
}
json_object_set_value(json_object(wave), 
"vgpr", vgpr);
}
 
/* */
if (include_shaders && (wd->ws.wave_status.halt || 
wd->ws.wave_status.fatal_halt)) {
-   struct umr_shaders_pgm *shader = 
umr_find_shader_in_stream(stream, vmid, pgm_addr);
+   struct umr_shaders_pgm *shader = NULL;
uint32_t shader_size;
uint64_t shader_addr;
+
+   if (ring_is_halted)
+   shader = 
umr_find_shader_in_stream(stream, vmid, pgm_addr);
+
if (shader) {
shader_size = shader->size;
shader_addr = shader->addr;
} else {
#define NUM_OPCODE_WORDS 16
pgm_addr -= (NUM_OPCODE_WORDS*4)/2;
shader_addr = pgm_addr;
shader_size = NUM_OPCODE_WORDS * 4;
#undef NUM_OPCODE_WORDS
}
@@ -2013,25 +2017,25 @@ JSON_Value *umr_proces

[PATCH umr 06/17] gfx10+: warn when halt_waves isn't used

2023-06-06 Thread Nicolai Hähnle
gfx10+ are particularly sensitive and are prone to producing completely
nonsensical information if trying to read from non-halted waves.

On gfx11, we really ought to take SQ_WAVE_VALID_AND_IDLE into account.
Rumor has it that reading from active waves can even lead to hangs,
though I've never witnessed that personally.

Signed-off-by: Nicolai Hähnle 
---
 src/app/print_waves.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/app/print_waves.c b/src/app/print_waves.c
index 04a4447..89f4abb 100644
--- a/src/app/print_waves.c
+++ b/src/app/print_waves.c
@@ -376,22 +376,25 @@ static void umr_print_waves_gfx_10_11(struct umr_asic 
*asic)
struct umr_wave_data *wd, *owd;
int first = 1, col = 0, ring_halted = 0, use_ring = 1;
struct umr_shaders_pgm *shader = NULL;
struct umr_packet_stream *stream = NULL;
struct {
uint32_t vmid, size;
uint64_t addr;
} ib_addr;
int start = -1, stop = -1;
 
-   if (asic->options.halt_waves)
+   if (asic->options.halt_waves) {
umr_sq_cmd_halt_waves(asic, UMR_SQ_CMD_HALT);
+   } else {
+   fprintf(stderr, "[WARNING]: Wave listing is unreliable if waves 
aren't halted; use -o halt_waves\n");
+   }
 
// don't scan for shader info by reading the ring if no_disasm is
// requested.  This is useful for when the ring or IBs contain
// invalid or racy data that cannot be reliably parsed.
if (!asic->options.no_disasm && strcmp(asic->options.ring_name, 
"none")) {
if (sscanf(asic->options.ring_name, 
"%"SCNx32"@%"SCNx64".%"SCNx32, &ib_addr.vmid, &ib_addr.addr, &ib_addr.size) == 
3)
use_ring = 0;
 
if (asic->options.halt_waves) {
// warn users if they don't specify a ring on gfx10 
hardware
-- 
2.40.0



[PATCH umr 15/17] Add umr_sq_cmd_singlestep

2023-06-06 Thread Nicolai Hähnle
Allow single-stepping a wave that is selected by HW ID.

Signed-off-by: Nicolai Hähnle 
---
 src/lib/sq_cmd_halt_waves.c | 75 -
 src/umr.h   |  1 +
 2 files changed, 67 insertions(+), 9 deletions(-)

diff --git a/src/lib/sq_cmd_halt_waves.c b/src/lib/sq_cmd_halt_waves.c
index 841b1d3..9a0ae69 100644
--- a/src/lib/sq_cmd_halt_waves.c
+++ b/src/lib/sq_cmd_halt_waves.c
@@ -17,45 +17,51 @@
  * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
  * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
  * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  * Authors: Tom St Denis 
  *
  */
 #include "umr.h"
 
+static struct umr_reg *find_sq_cmd(struct umr_asic *asic)
+{
+   // SQ_CMD is not present on SI
+   if (asic->family == FAMILY_SI)
+   return 0;
+
+   struct umr_reg *reg = umr_find_reg_data_by_ip_by_instance(asic, "gfx", 
asic->options.vm_partition,
+ asic->family >= FAMILY_GFX11 
? "regSQ_CMD" : "mmSQ_CMD");
+   if (!reg)
+   asic->err_msg("[BUG]: Cannot find SQ_CMD register in 
umr_sq_cmd_halt_waves()\n");
+   return reg;
+}
+
 /**
  * umr_sq_cmd_halt_waves - Attempt to halt or resume waves
  *
  * @mode:  Use UMR_SQ_CMD_HALT to halt waves and
  * UMR_SQ_CMD_RESUME to resume waves.
  */
 int umr_sq_cmd_halt_waves(struct umr_asic *asic, enum umr_sq_cmd_halt_resume 
mode)
 {
struct umr_reg *reg;
uint32_t value;
uint64_t addr;
struct {
uint32_t se, sh, instance, use_grbm;
} grbm;
 
-   // SQ_CMD is not present on SI
-   if (asic->family == FAMILY_SI)
-   return 0;
-
-   reg = umr_find_reg_data_by_ip_by_instance(asic, "gfx", 
asic->options.vm_partition,
- asic->family >= FAMILY_GFX11 
? "regSQ_CMD" : "mmSQ_CMD");
-   if (!reg) {
-   asic->err_msg("[BUG]: Cannot find SQ_CMD register in 
umr_sq_cmd_halt_waves()\n");
+   reg = find_sq_cmd(asic);
+   if (!reg)
return -1;
-   }
 
// compose value
if (asic->family == FAMILY_CIK) {
value = umr_bitslice_compose_value(asic, reg, "CMD", mode == 
UMR_SQ_CMD_HALT ? 1 : 2); // SETHALT
} else {
value = umr_bitslice_compose_value(asic, reg, "CMD", 1); // 
SETHALT
value |= umr_bitslice_compose_value(asic, reg, "DATA", mode == 
UMR_SQ_CMD_HALT ? 1 : 0);
}
value |= umr_bitslice_compose_value(asic, reg, "MODE", 1); // BROADCAST
 
@@ -76,10 +82,61 @@ int umr_sq_cmd_halt_waves(struct umr_asic *asic, enum 
umr_sq_cmd_halt_resume mod
asic->reg_funcs.write_reg(asic, addr, value, reg->type);
 
/* restore whatever the user had picked */
asic->options.use_bank   = grbm.use_grbm;
asic->options.bank.grbm.se   = grbm.se;
asic->options.bank.grbm.sh   = grbm.sh;
asic->options.bank.grbm.instance = grbm.instance;
 
return 0;
 }
+
+/**
+ * umr_sq_cmd_singlestep - Attempt to single-step a single wave
+ *
+ * The wave is assumed to be halted.
+ */
+int umr_sq_cmd_singlestep(struct umr_asic *asic, uint32_t se, uint32_t sh, 
uint32_t wgp, uint32_t simd, uint32_t wave)
+{
+   struct umr_reg *reg;
+   uint32_t value;
+   uint64_t addr;
+   struct {
+   uint32_t se, sh, instance, use_grbm;
+   } grbm;
+
+   if (asic->family < FAMILY_NV)
+   return -1; // Only supported on gfx10+
+
+   reg = find_sq_cmd(asic);
+   if (!reg)
+   return -1;
+
+   // compose value
+   value = umr_bitslice_compose_value(asic, reg, "CMD", 8); // SINGLE_STEP
+   value |= umr_bitslice_compose_value(asic, reg, "MODE", 0); // single 
wave
+   value |= umr_bitslice_compose_value(asic, reg, "WAVE_ID", wave);
+
+   /* copy grbm options to restore later */
+   grbm.use_grbm = asic->options.use_bank;
+   grbm.se   = asic->options.bank.grbm.se;
+   grbm.sh   = asic->options.bank.grbm.sh;
+   grbm.instance = asic->options.bank.grbm.instance;
+
+   /* set GRBM banking options */
+   asic->options.use_bank   = 1;
+   asic->options.bank.grbm.se   = se;
+   asic->options.bank.grbm.sh   = sh;
+   asic->options.bank.grbm.instance = (wgp << 2) | simd;
+
+   // compose address
+   addr = reg->addr * 4;
+   asic->reg_funcs.write_reg(asic, addr, value, reg->type);
+
+   /* restore whatever the user had picked */
+   asic->options.us

[PATCH umr 16/17] gui: add a wave single step button

2023-06-06 Thread Nicolai Hähnle
Signed-off-by: Nicolai Hähnle 
---
 src/app/gui/commands.c  | 50 +
 src/app/gui/waves_panel.cpp | 40 +
 src/lib/scan_waves.c|  2 +-
 src/umr.h   |  2 ++
 4 files changed, 93 insertions(+), 1 deletion(-)

diff --git a/src/app/gui/commands.c b/src/app/gui/commands.c
index 38ae44d..fb6efa0 100644
--- a/src/app/gui/commands.c
+++ b/src/app/gui/commands.c
@@ -2036,20 +2036,70 @@ JSON_Value *umr_process_json_request(JSON_Object 
*request, void **raw_data, unsi
answer = json_value_init_object();
 
waves_to_json(asic, ring_is_halted, 1, json_object(answer));
 
if (disable_gfxoff && asic->fd.gfxoff >= 0) {
uint32_t value = 1;
write(asic->fd.gfxoff, &value, sizeof(value));
}
if (resume_waves)
umr_sq_cmd_halt_waves(asic, UMR_SQ_CMD_RESUME);
+   } else if (strcmp(command, "singlestep") == 0) {
+   strcpy(asic->options.ring_name, json_object_get_string(request, 
"ring"));
+
+   unsigned se = (unsigned)json_object_get_number(request, "se");
+   unsigned sh = (unsigned)json_object_get_number(request, "sh");
+   unsigned wgp = (unsigned)json_object_get_number(request, "wgp");
+   unsigned simd_id = (unsigned)json_object_get_number(request, 
"simd_id");
+   unsigned wave_id = (unsigned)json_object_get_number(request, 
"wave_id");
+
+   asic->options.skip_gprs = 0;
+   asic->options.verbose = 0;
+
+   struct umr_wave_data wd;
+   memset(&wd, 0, sizeof(wd));
+
+   int r = umr_scan_wave_slot(asic, se, sh, wgp, simd_id, wave_id, 
&wd);
+   if (r < 0) {
+   last_error = "failed to scan wave slot";
+   goto error;
+   }
+
+   // Send the single-step command in a limited retry loop because 
a small number of
+   // single-step commands are required before an instruction is 
actually issued after
+   // a branch.
+   for (int retry = 0; r == 1 && retry < 5; ++retry) {
+   umr_sq_cmd_singlestep(asic, se, sh, wgp, simd_id, 
wave_id);
+
+   struct umr_wave_data new_wd;
+   memset(&new_wd, 0, sizeof(new_wd));
+
+   r = umr_scan_wave_slot(asic, se, sh, wgp, simd_id, 
wave_id, &new_wd);
+   if (r < 0) {
+   last_error = "failed to scan wave slot";
+   goto error;
+   }
+
+   bool moved = new_wd.ws.pc_lo != wd.ws.pc_lo || 
new_wd.ws.pc_hi != wd.ws.pc_hi;
+   memcpy(&wd, &new_wd, sizeof(wd));
+   if (moved)
+   break;
+   }
+
+   answer = json_value_init_object();
+
+   if (r == 1) {
+   JSON_Value *shaders = json_value_init_object();
+   JSON_Value *wave = wave_to_json(asic, &wd, 1, /* todo: 
stream */NULL, shaders);
+   json_object_set_value(json_object(answer), "wave", 
wave);
+   json_object_set_value(json_object(answer), "shaders", 
shaders);
+   }
} else if (strcmp(command, "resume-waves") == 0) {
strcpy(asic->options.ring_name, json_object_get_string(request, 
"ring"));
umr_sq_cmd_halt_waves(asic, UMR_SQ_CMD_RESUME);
answer = json_value_init_object();
} else if (strcmp(command, "ring") == 0) {
char *ring_name = (char*)json_object_get_string(request, 
"ring");
uint32_t wptr, rptr, drv_wptr, ringsize, value, *ring_data;
int halt_waves = json_object_get_boolean(request, "halt_waves");
enum umr_ring_type rt;
asic->options.halt_waves = halt_waves;
diff --git a/src/app/gui/waves_panel.cpp b/src/app/gui/waves_panel.cpp
index 7e13b48..68b06ea 100644
--- a/src/app/gui/waves_panel.cpp
+++ b/src/app/gui/waves_panel.cpp
@@ -106,21 +106,38 @@ public:
 
JSON_Array *waves_array = 
json_object_get_array(json_object(answer), "waves");
int wave_count = json_array_get_count(waves_array);
for (int i = 0; i < wave_count; ++i) {
JSON_Object *wave = 
json_object(json_value_deep_copy(json_array_get_value(waves_array, i)));
waves.emplace_back(get_wave_id(wave)

[PATCH umr 12/17] server/waves: always report threads (exec)

2023-06-06 Thread Nicolai Hähnle
exec is read like the wave status and should be equally reliable or unreliable

Signed-off-by: Nicolai Hähnle 
---
 src/app/gui/commands.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/src/app/gui/commands.c b/src/app/gui/commands.c
index b7b28a7..1e5e854 100644
--- a/src/app/gui/commands.c
+++ b/src/app/gui/commands.c
@@ -1618,20 +1618,28 @@ static void wave_to_json(struct umr_asic *asic, int 
is_halted, int include_shade
json_object_set_number(json_object(hw_id), "pipe_id", 
wd->ws.hw_id2.pipe_id);
json_object_set_number(json_object(hw_id), "me_id", 
wd->ws.hw_id2.me_id);
json_object_set_number(json_object(hw_id), "state_id", 
wd->ws.hw_id2.state_id);
json_object_set_number(json_object(hw_id), "wg_id", 
wd->ws.hw_id2.wg_id);
json_object_set_number(json_object(hw_id), 
"compat_level", wd->ws.hw_id2.compat_level);
json_object_set_number(json_object(hw_id), "vm_id", 
wd->ws.hw_id2.vm_id);
vmid = wd->ws.hw_id2.vm_id;
}
json_object_set_value(json_object(wave), "hw_id", hw_id);
 
+   JSON_Value *threads = json_value_init_array();
+   int num_threads = wd->num_threads;
+   for (int thread = 0; thread < num_threads; thread++) {
+   unsigned live = thread < 32 ? (wd->ws.exec_lo & (1u << 
thread)) : (wd->ws.exec_hi & (1u << (thread - 32)));
+   json_array_append_boolean(json_array(threads), live ? 1 
: 0);
+   }
+   json_object_set_value(json_object(wave), "threads", threads);
+
JSON_Value *gpr_alloc = json_value_init_object();
json_object_set_number(json_object(gpr_alloc), "vgpr_base", 
wd->ws.gpr_alloc.vgpr_base);
json_object_set_number(json_object(gpr_alloc), "vgpr_size", 
wd->ws.gpr_alloc.vgpr_size);
json_object_set_number(json_object(gpr_alloc), "sgpr_base", 
wd->ws.gpr_alloc.sgpr_base);
json_object_set_number(json_object(gpr_alloc), "sgpr_size", 
wd->ws.gpr_alloc.sgpr_size);
json_object_set_value(json_object(wave), "gpr_alloc", 
gpr_alloc);
 
if (is_halted && wd->ws.gpr_alloc.value != 0xbebebeef) {
int sgpr_count;
if (asic->family <= FAMILY_AI) {
@@ -1639,29 +1647,20 @@ static void wave_to_json(struct umr_asic *asic, int 
is_halted, int include_shade
sgpr_count = (wd->ws.gpr_alloc.sgpr_size + 1) 
<< shift;
} else {
sgpr_count = 108; // regular SGPRs and VCC
}
JSON_Value *sgpr = json_value_init_array();
for (int x = 0; x < sgpr_count; x++) {
json_array_append_number(json_array(sgpr), 
wd->sgprs[x]);
}
json_object_set_value(json_object(wave), "sgpr", sgpr);
 
-   JSON_Value *threads = json_value_init_array();
-   int num_threads = wd->num_threads;
-   for (int thread = 0; thread < num_threads; thread++) {
-   unsigned live = thread < 32 ? (wd->ws.exec_lo & 
(1u << thread)) : (wd->ws.exec_hi & (1u << (thread - 32)));
-   json_array_append_boolean(json_array(threads), 
live ? 1 : 0);
-   }
-   json_object_set_value(json_object(wave), "threads", 
threads);
-
-
if (wd->have_vgprs) {
unsigned granularity = 
asic->parameters.vgpr_granularity;
unsigned vpgr_count = 
(wd->ws.gpr_alloc.vgpr_size + 1) << granularity;
JSON_Value *vgpr = json_value_init_array();
for (int x = 0; x < (int) vpgr_count; x++) {
JSON_Value *v = json_value_init_array();
for (int thread = 0; thread < 
num_threads; thread++) {

json_array_append_number(json_array(v), wd->vgprs[thread * 256 + x]);
}

json_array_append_value(json_array(vgpr), v);
-- 
2.40.0



[PATCH umr 10/17] gui/info_panel: correctly identify the GFX11 family

2023-06-06 Thread Nicolai Hähnle
Signed-off-by: Nicolai Hähnle 
---
 src/app/gui/info_panel.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/app/gui/info_panel.cpp b/src/app/gui/info_panel.cpp
index d9c3421..bca7137 100644
--- a/src/app/gui/info_panel.cpp
+++ b/src/app/gui/info_panel.cpp
@@ -25,21 +25,21 @@
 #include "panels.h"
 
 class InfoPanel : public Panel {
 public:
InfoPanel(struct umr_asic *asic) : Panel(asic) { }
 
void process_server_message(JSON_Object *response, void *raw_data, 
unsigned raw_data_size) {}
 
bool display(float dt, const ImVec2& avail, bool can_send_request) {
static const char *families[] = {
-   "SI", "CIK", "VI", "AI", "NV", "NPI", "CFG",
+   "SI", "CIK", "VI", "AI", "NV", "GFX11", "NPI", "CFG",
};
 
ImGui::BeginChild("Info", ImVec2(avail.x / 2, 0), false, 
ImGuiWindowFlags_NoTitleBar);
ImGui::BeginTable("Info", 2, ImGuiTableFlags_Borders);
ImGui::TableNextRow();
ImGui::TableSetColumnIndex(0); ImGui::Text("ASIC name");
ImGui::TableSetColumnIndex(1); ImGui::Text("#b58900%s", 
asic->asicname);
ImGui::TableNextRow();
ImGui::TableSetColumnIndex(0); ImGui::Text("Instance");
ImGui::TableSetColumnIndex(1); ImGui::Text("#b58900%d", 
asic->instance);
-- 
2.40.0



[PATCH umr 08/17] gfx11: wave limit is 16

2023-06-06 Thread Nicolai Hähnle
Signed-off-by: Nicolai Hähnle 
---
 src/lib/scan_waves.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/lib/scan_waves.c b/src/lib/scan_waves.c
index 3279cc2..37ebcff 100644
--- a/src/lib/scan_waves.c
+++ b/src/lib/scan_waves.c
@@ -593,21 +593,26 @@ static int umr_scan_wave_slot(struct umr_asic *asic, 
uint32_t se, uint32_t sh, u
  * \param pppwd points to the pointer-to-pointer-to the last element of a 
linked
  *  list of wave data structures, with the last element yet to be 
filled in.
  *  The pointer-to-pointer-to is updated by this function.
  */
 static int umr_scan_wave_simd(struct umr_asic *asic, uint32_t se, uint32_t sh, 
uint32_t cu, uint32_t simd,
   struct umr_wave_data ***pppwd)
 {
uint32_t wave, wave_limit;
int r;
 
-   wave_limit = asic->family <= FAMILY_AI ? 10 : 20;
+   if (asic->family <= FAMILY_AI)
+   wave_limit = 10;
+   else if (asic->family == FAMILY_NV)
+   wave_limit = 20;
+   else
+   wave_limit = 16;
 
for (wave = 0; wave < wave_limit; wave++) {
struct umr_wave_data *pwd = **pppwd;
if ((r = umr_scan_wave_slot(asic, se, sh, cu, simd, wave, pwd)) 
== 1) {
pwd->next = calloc(1, sizeof(*pwd));
if (!pwd->next) {
asic->err_msg("[ERROR]: Out of memory\n");
return -1;
}
*pppwd = &pwd->next;
-- 
2.40.0



[PATCH umr 01/17] Use the correct register prefix on Navi3 for top

2023-06-06 Thread Nicolai Hähnle
Signed-off-by: Nicolai Hähnle 
---
 src/app/top.c | 89 +++
 1 file changed, 48 insertions(+), 41 deletions(-)

diff --git a/src/app/top.c b/src/app/top.c
index f99b7b5..40ed0ac 100644
--- a/src/app/top.c
+++ b/src/app/top.c
@@ -903,30 +903,31 @@ void load_options(void)
}
} else {
// add some defaults to not be so boring
top_options.vi.grbm = 1;
top_options.vi.vgt = 1;
top_options.vi.ta = 1;
}
 }
 
 static struct {
-   char *name, *tag;
+   char name[32];
+   char *tag;
uint64_t counts[32];
int *opt, is_sensor;
uint32_t addr, mask[32], cmp[32];
uint64_t addr_mask;
struct umr_bitfield *bits;
 } stat_counters[64];
 
-#define ENTRY(_j, _name, _bits, _opt, _tag) do { int _i = (_j); 
stat_counters[_i].name = _name; stat_counters[_i].bits = _bits; 
stat_counters[_i].opt = _opt; stat_counters[_i].tag = _tag; } while (0)
-#define ENTRY_SENSOR(_j, _name, _bits, _opt, _tag) do { int _i = (_j); 
stat_counters[_i].name = _name; stat_counters[_i].bits = _bits; 
stat_counters[_i].opt = _opt; stat_counters[_i].tag = _tag; 
stat_counters[_i].is_sensor = 1; } while (0)
+#define ENTRY(_j, _prefix, _name, _bits, _opt, _tag) do { int _i = (_j); 
snprintf(stat_counters[_i].name, sizeof(stat_counters[_i].name), "%s%s", 
_prefix, _name); stat_counters[_i].bits = _bits; stat_counters[_i].opt = _opt; 
stat_counters[_i].tag = _tag; } while (0)
+#define ENTRY_SENSOR(_j, _name, _bits, _opt, _tag) do { int _i = (_j); 
strcpy(stat_counters[_i].name, _name); stat_counters[_i].bits = _bits; 
stat_counters[_i].opt = _opt; stat_counters[_i].tag = _tag; 
stat_counters[_i].is_sensor = 1; } while (0)
 
 static void vi_handle_keys(int i)
 {
switch(i) {
case 't':  top_options.vi.ta ^= 1; break;
case 'g':  top_options.vi.vgt ^= 1; break;
case 'G':  top_options.vi.gfxpwr ^= 1; break;
case 'u':  top_options.vi.uvd ^= 1; break;
case 'c':  top_options.vi.vce ^= 1; break;
case 's':  top_options.vi.grbm ^= 1; break;
@@ -963,44 +964,59 @@ static int sriov_supported_vf(struct umr_asic *asic)
 
return (sriov_ctrl & PCI_SRIOV_CTRL_VFE) ? sriov_num_vf 
: 0;
}
pci_offset = PCI_EXT_CAP_NEXT(pci_cfg_data);
}
return retval;
 }
 
 static void top_build_vi_program(struct umr_asic *asic)
 {
+   const char *gfx_prefix;
+   const char *vcn_prefix;
int i, j, k;
char *regname;
 
+   gfx_prefix = "mm";
+   struct umr_ip_block* gfx = umr_find_ip_block(asic, "gfx", 
asic->options.vm_partition);
+   if (gfx && gfx->discoverable.maj >= 11)
+   gfx_prefix = "reg";
+
+   vcn_prefix = "mm";
+   struct umr_ip_block* vcn = umr_find_ip_block(asic, "vcn", 
asic->options.vm_partition);
+   if (vcn && ((vcn->discoverable.maj == 2 && vcn->discoverable.min >= 6) 
|| vcn->discoverable.maj >= 4))
+   vcn_prefix = "reg";
+
stat_counters[0].bits = &stat_grbm_bits[0];
stat_counters[0].opt = &top_options.vi.grbm;
stat_counters[0].tag = "GRBM";
 
-   stat_counters[1].opt = &top_options.vi.grbm;
-   stat_counters[1].tag = stat_counters[0].tag;
-   stat_counters[1].name = "mmGRBM_STATUS2";
-   stat_counters[1].bits = &stat_grbm2_bits[0];
+   // which SE to read ...
+   if (options.use_bank == 1)
+   snprintf(stat_counters[0].name, sizeof(stat_counters[0].name), 
gfx_prefix, "GRBM_STATUS_SE%d", options.bank.grbm.se);
+   else
+   snprintf(stat_counters[0].name, sizeof(stat_counters[0].name), 
gfx_prefix, "GRBM_STATUS");
+
+   i = 1;
 
-   i = 2;
+   ENTRY(i++, gfx_prefix, "GRBM_STATUS2", &stat_grbm2_bits[0], 
&top_options.vi.grbm, "GRBM");
 
top_options.sriov.active_vf = -1;
top_options.sriov.num_vf = sriov_supported_vf(asic);
if (top_options.sriov.num_vf != 0) {
stat_counters[i].is_sensor = 3;
-   ENTRY(i++, "mmRLC_GPU_IOV_ACTIVE_FCN_ID", &stat_rlc_iov_bits[0],
+   ENTRY(i++, gfx_prefix, "RLC_GPU_IOV_ACTIVE_FCN_ID", 
&stat_rlc_iov_bits[0],
&top_options.vi.grbm, "GPU_IOV");
}
 
if (asic->config.gfx.family > 110)
-   ENTRY(i++, "mmRLC_GPM_STAT", &stat_rlc_gpm_bits[0], 
&top_options.vi.gfxpwr, "GFX PWR");
+   ENTRY(i++, gfx_prefix, "RLC_GPM_STAT", &stat

[PATCH umr 03/17] Silence a warning

2023-06-06 Thread Nicolai Hähnle
The function has a code path in which addr isn't set to a meaningful
value, but it is still written to the test log file. Write 0 instead of
garbage.

Signed-off-by: Nicolai Hähnle 
---
 src/lib/lowlevel/linux/read_gprwave.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/lib/lowlevel/linux/read_gprwave.c 
b/src/lib/lowlevel/linux/read_gprwave.c
index 86320f5..e861ee4 100644
--- a/src/lib/lowlevel/linux/read_gprwave.c
+++ b/src/lib/lowlevel/linux/read_gprwave.c
@@ -359,21 +359,21 @@ int umr_read_vgprs(struct umr_asic *asic, struct 
umr_wave_status *ws, uint32_t t
umr_grbm_select_index(asic, 0x, 0x, 
0x);
}
return 0;
}
 }
 
 int umr_get_wave_status(struct umr_asic *asic, unsigned se, unsigned sh, 
unsigned cu, unsigned simd, unsigned wave, struct umr_wave_status *ws)
 {
uint32_t buf[32];
int r;
-   uint64_t addr;
+   uint64_t addr = 0;
struct amdgpu_debugfs_gprwave_iocdata id;
 
memset(buf, 0, sizeof buf);
 
if (asic->fd.gprwave >= 0) {
memset(&id, 0, sizeof id);
id.gpr_or_wave = 0;
id.se = se;
id.sh = sh;
id.cu = cu;
-- 
2.40.0



[PATCH umr 09/17] gfx11: ignore wave status fields that were removed

2023-06-06 Thread Nicolai Hähnle
Signed-off-by: Nicolai Hähnle 
---
 src/lib/scan_waves.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/src/lib/scan_waves.c b/src/lib/scan_waves.c
index 37ebcff..ca1d9fb 100644
--- a/src/lib/scan_waves.c
+++ b/src/lib/scan_waves.c
@@ -451,77 +451,77 @@ static int umr_parse_wave_data_gfx_10_11(struct umr_asic 
*asic, struct umr_wave_
ws->hw_id2.vm_id= umr_bitslice_reg(asic, reg, "VM_ID", 
value);
ws->hw_id2.compat_level = umr_bitslice_reg_quiet(asic, reg, 
"COMPAT_LEVEL", value); // not on 10.3
 
if (asic->family < FAMILY_GFX11)
ws->wave_inst_dw0 = buf[x++];
 
ws->gpr_alloc.value = value = buf[x++];
reg = umr_find_reg_data_by_ip_by_instance(asic, "gfx", 
asic->options.vm_partition, "ixSQ_WAVE_GPR_ALLOC");
ws->gpr_alloc.vgpr_base = umr_bitslice_reg(asic, reg, 
"VGPR_BASE", value);
ws->gpr_alloc.vgpr_size = umr_bitslice_reg(asic, reg, 
"VGPR_SIZE", value);
-   ws->gpr_alloc.sgpr_base = umr_bitslice_reg(asic, reg, 
"SGPR_BASE", value);
-   ws->gpr_alloc.sgpr_size = umr_bitslice_reg(asic, reg, 
"SGPR_SIZE", value);
 
ws->lds_alloc.value = value = buf[x++];
reg = umr_find_reg_data_by_ip_by_instance(asic, "gfx", 
asic->options.vm_partition, "ixSQ_WAVE_LDS_ALLOC");
ws->lds_alloc.lds_base = umr_bitslice_reg(asic, reg, 
"LDS_BASE", value);
ws->lds_alloc.lds_size = umr_bitslice_reg(asic, reg, 
"LDS_SIZE", value);
ws->lds_alloc.vgpr_shared_size = umr_bitslice_reg(asic, reg, 
"VGPR_SHARED_SIZE", value);
 
ws->trapsts.value = value = buf[x++];
reg = umr_find_reg_data_by_ip_by_instance(asic, "gfx", 
asic->options.vm_partition, "ixSQ_WAVE_TRAPSTS");
ws->trapsts.excp  = umr_bitslice_reg(asic, reg, "EXCP", 
value) |

(umr_bitslice_reg(asic, reg, "EXCP_HI", value) << 9);
ws->trapsts.savectx   = umr_bitslice_reg(asic, reg, 
"SAVECTX", value);
ws->trapsts.illegal_inst  = umr_bitslice_reg(asic, reg, 
"ILLEGAL_INST", value);
ws->trapsts.excp_hi   = umr_bitslice_reg(asic, reg, 
"EXCP_HI", value);
ws->trapsts.buffer_oob= umr_bitslice_reg(asic, reg, 
"BUFFER_OOB", value);
-   ws->trapsts.excp_cycle= umr_bitslice_reg(asic, reg, 
"EXCP_CYCLE", value);
+   ws->trapsts.excp_cycle= umr_bitslice_reg_quiet(asic, reg, 
"EXCP_CYCLE", value);
ws->trapsts.excp_group_mask = umr_bitslice_reg_quiet(asic, reg, 
"EXCP_GROUP_MASK", value);
-   ws->trapsts.excp_wave64hi = umr_bitslice_reg(asic, reg, 
"EXCP_WAVE64HI", value);
+   ws->trapsts.excp_wave64hi = umr_bitslice_reg_quiet(asic, reg, 
"EXCP_WAVE64HI", value);
ws->trapsts.xnack_error   = umr_bitslice_reg_quiet(asic, reg, 
"XNACK_ERROR", value);
ws->trapsts.utc_error = umr_bitslice_reg_quiet(asic, reg, 
"UTC_ERROR", value);
-   ws->trapsts.dp_rate   = umr_bitslice_reg(asic, reg, 
"DP_RATE", value);
+   ws->trapsts.dp_rate   = umr_bitslice_reg_quiet(asic, reg, 
"DP_RATE", value);
 
ws->ib_sts.value = value = buf[x++];
reg = umr_find_reg_data_by_ip_by_instance(asic, "gfx", 
asic->options.vm_partition, "ixSQ_WAVE_IB_STS");
-   ws->ib_sts.vm_cnt   = umr_bitslice_reg(asic, reg, "VM_CNT", 
value) |
- 
(umr_bitslice_reg(asic, reg, "VM_CNT_HI", value) << 4);
+   ws->ib_sts.vm_cnt   = umr_bitslice_reg(asic, reg, "VM_CNT", 
value);
+   if (asic->family == FAMILY_NV)
+   ws->ib_sts.vm_cnt |= (umr_bitslice_reg(asic, reg, 
"VM_CNT_HI", value) << 4);
ws->ib_sts.exp_cnt  = umr_bitslice_reg(asic, reg, "EXP_CNT", 
value);
-   ws->ib_sts.lgkm_cnt = umr_bitslice_reg(asic, reg, "LGKM_CNT", 
value) |
- 
(umr_bitslice_reg(asic, reg, "LGKM_CNT_BIT4", value) << 4) |
- 
(umr_bitslice_reg(asic, reg, "LGKM_CNT_BIT5", value) << 5);
-   ws->ib_sts.valu_cnt = umr_bitslice_reg(asic, reg, "VALU_CNT"

[PATCH umr 04/17] gfx10+: iterate only over existing WGPs when scanning waves

2023-06-06 Thread Nicolai Hähnle
We overload "cu" to mean "wgp" in a bunch of places, but max_cu_per_sh
is always in terms of CUs.

Signed-off-by: Nicolai Hähnle 
---
 src/lib/scan_waves.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/src/lib/scan_waves.c b/src/lib/scan_waves.c
index 767520c..3279cc2 100644
--- a/src/lib/scan_waves.c
+++ b/src/lib/scan_waves.c
@@ -618,48 +618,50 @@ static int umr_scan_wave_simd(struct umr_asic *asic, 
uint32_t se, uint32_t sh, u
return 0;
 }
 
 /**
  * umr_scan_wave_data - Scan for any halted valid waves
  *
  * Returns NULL on error (or no waves found).
  */
 struct umr_wave_data *umr_scan_wave_data(struct umr_asic *asic)
 {
-   uint32_t se, sh, cu, simd;
+   uint32_t se, sh, simd;
struct umr_wave_data *ohead, *head, **ptail;
int r;
 
ohead = head = calloc(1, sizeof *head);
if (!head) {
asic->err_msg("[ERROR]: Out of memory\n");
return NULL;
}
ptail = &head;
 
for (se = 0; se < asic->config.gfx.max_shader_engines; se++)
-   for (sh = 0; sh < asic->config.gfx.max_sh_per_se; sh++)
-   for (cu = 0; cu < asic->config.gfx.max_cu_per_sh; cu++) {
+   for (sh = 0; sh < asic->config.gfx.max_sh_per_se; sh++) {
if (asic->family <= FAMILY_AI) {
-   asic->wave_funcs.get_wave_sq_info(asic, se, sh, cu, 
&(*ptail)->ws);
-   if ((*ptail)->ws.sq_info.busy) {
-   for (simd = 0; simd < 4; simd++) {
-   r = umr_scan_wave_simd(asic, se, sh, 
cu, simd, &ptail);
-   if (r < 0)
-   goto error;
+   for (uint32_t cu = 0; cu < 
asic->config.gfx.max_cu_per_sh; cu++) {
+   asic->wave_funcs.get_wave_sq_info(asic, se, sh, 
cu, &(*ptail)->ws);
+   if ((*ptail)->ws.sq_info.busy) {
+   for (simd = 0; simd < 4; simd++) {
+   r = umr_scan_wave_simd(asic, 
se, sh, cu, simd, &ptail);
+   if (r < 0)
+   goto error;
+   }
}
}
} else {
+   for (uint32_t wgp = 0; wgp < 
asic->config.gfx.max_cu_per_sh / 2; wgp++)
for (simd = 0; simd < 4; simd++) {
-   asic->wave_funcs.get_wave_sq_info(asic, se, sh, 
MANY_TO_INSTANCE(cu, simd), &(*ptail)->ws);
+   asic->wave_funcs.get_wave_sq_info(asic, se, sh, 
MANY_TO_INSTANCE(wgp, simd), &(*ptail)->ws);
if ((*ptail)->ws.sq_info.busy) {
-   r = umr_scan_wave_simd(asic, se, sh, 
cu, simd, &ptail);
+   r = umr_scan_wave_simd(asic, se, sh, 
wgp, simd, &ptail);
if (r < 0)
goto error;
}
}
}
}
 
// drop the pre-allocated tail node
free(*ptail);
*ptail = NULL;
-- 
2.40.0



[PATCH umr 05/17] gfx10+: fix SGPR counts

2023-06-06 Thread Nicolai Hähnle
On gfx10+, every wave has 106 regular SGPRs followed immediately by VCC,
meaning we should show 108 SGPRs by default.

They are followed by 16 TTMPs, for 124 in total.

Signed-off-by: Nicolai Hähnle 
---
 src/app/gui/commands.c| 16 
 src/app/print_waves.c |  4 ++--
 src/lib/lowlevel/linux/read_gprwave.c |  2 +-
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/app/gui/commands.c b/src/app/gui/commands.c
index 45bb9d4..b7b28a7 100644
--- a/src/app/gui/commands.c
+++ b/src/app/gui/commands.c
@@ -1626,29 +1626,29 @@ static void wave_to_json(struct umr_asic *asic, int 
is_halted, int include_shade
json_object_set_value(json_object(wave), "hw_id", hw_id);
 
JSON_Value *gpr_alloc = json_value_init_object();
json_object_set_number(json_object(gpr_alloc), "vgpr_base", 
wd->ws.gpr_alloc.vgpr_base);
json_object_set_number(json_object(gpr_alloc), "vgpr_size", 
wd->ws.gpr_alloc.vgpr_size);
json_object_set_number(json_object(gpr_alloc), "sgpr_base", 
wd->ws.gpr_alloc.sgpr_base);
json_object_set_number(json_object(gpr_alloc), "sgpr_size", 
wd->ws.gpr_alloc.sgpr_size);
json_object_set_value(json_object(wave), "gpr_alloc", 
gpr_alloc);
 
if (is_halted && wd->ws.gpr_alloc.value != 0xbebebeef) {
-   int shift;
-   if (asic->family <= FAMILY_CIK || asic->family >= 
FAMILY_NV)
-   shift = 3;
-   else
-   shift = 4;
-
-   int spgr_count = (wd->ws.gpr_alloc.sgpr_size + 1) << 
shift;
+   int sgpr_count;
+   if (asic->family <= FAMILY_AI) {
+   int shift = asic->family <= FAMILY_CIK ? 3 : 4;
+   sgpr_count = (wd->ws.gpr_alloc.sgpr_size + 1) 
<< shift;
+   } else {
+   sgpr_count = 108; // regular SGPRs and VCC
+   }
JSON_Value *sgpr = json_value_init_array();
-   for (int x = 0; x < spgr_count; x++) {
+   for (int x = 0; x < sgpr_count; x++) {
json_array_append_number(json_array(sgpr), 
wd->sgprs[x]);
}
json_object_set_value(json_object(wave), "sgpr", sgpr);
 
JSON_Value *threads = json_value_init_array();
int num_threads = wd->num_threads;
for (int thread = 0; thread < num_threads; thread++) {
unsigned live = thread < 32 ? (wd->ws.exec_lo & 
(1u << thread)) : (wd->ws.exec_hi & (1u << (thread - 32)));
json_array_append_boolean(json_array(threads), 
live ? 1 : 0);
}
diff --git a/src/app/print_waves.c b/src/app/print_waves.c
index de93f93..04a4447 100644
--- a/src/app/print_waves.c
+++ b/src/app/print_waves.c
@@ -467,21 +467,21 @@ static void umr_print_waves_gfx_10_11(struct umr_asic 
*asic)
(unsigned)wd->ws.hw_id1.wave_id, // 
TODO: wgp printed out won't match geometry for now w.r.t. to SPI
(unsigned 
long)wd->ws.wave_status.value, (unsigned long)wd->ws.pc_hi, (unsigned 
long)wd->ws.pc_lo,
(unsigned long)wd->ws.wave_inst_dw0, 
(unsigned long)wd->ws.exec_hi, (unsigned long)wd->ws.exec_lo,
(unsigned long)wd->ws.hw_id1.value, 
(unsigned long)wd->ws.hw_id2.value, (unsigned long)wd->ws.gpr_alloc.value,
(unsigned long)wd->ws.lds_alloc.value, 
(unsigned long)wd->ws.trapsts.value,
(unsigned long)wd->ws.ib_sts.value, 
(unsigned long)wd->ws.ib_sts2.value, (unsigned long)wd->ws.ib_dbg1,
(unsigned long)wd->ws.m0, (unsigned 
long)wd->ws.mode.value);
}
 
if (wd->ws.wave_status.halt || 
wd->ws.wave_status.fatal_halt) {
-   for (x = 0; x < 112; x += 4)
+   for (x = 0; x < 108; x += 4)
printf(">SGPRS[%u..%u] = { %08lx, 
%08lx, %08lx, %08lx }\n",
(unsigned)(x),
(unsigned)(x + 3),
(unsigned long)wd->sgprs[x],
 

[PATCH umr 07/17] gfx11: enable wave scanning

2023-06-06 Thread Nicolai Hähnle
Signed-off-by: Nicolai Hähnle 
---
 src/lib/lowlevel/linux/read_gprwave.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/lib/lowlevel/linux/read_gprwave.c 
b/src/lib/lowlevel/linux/read_gprwave.c
index 6d68b7e..d357896 100644
--- a/src/lib/lowlevel/linux/read_gprwave.c
+++ b/src/lib/lowlevel/linux/read_gprwave.c
@@ -425,14 +425,12 @@ int umr_get_wave_status(struct umr_asic *asic, unsigned 
se, unsigned sh, unsigne
}
fprintf(asic->options.test_log_fd, "}\n");
}
 
 
return umr_parse_wave_data_gfx(asic, ws, buf);
 }
 
 int umr_get_wave_sq_info(struct umr_asic *asic, unsigned se, unsigned sh, 
unsigned cu, struct umr_wave_status *ws)
 {
-   if (asic->family <= FAMILY_NV)
-   return umr_get_wave_sq_info_vi(asic, se, sh, cu, ws);
-   return -1;
+   return umr_get_wave_sq_info_vi(asic, se, sh, cu, ws);
 }
-- 
2.40.0



[PATCH umr 02/17] Use the correct prefix for Navi3 in halt_waves

2023-06-06 Thread Nicolai Hähnle
Signed-off-by: Nicolai Hähnle 
---
 src/lib/sq_cmd_halt_waves.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/lib/sq_cmd_halt_waves.c b/src/lib/sq_cmd_halt_waves.c
index 368e701..841b1d3 100644
--- a/src/lib/sq_cmd_halt_waves.c
+++ b/src/lib/sq_cmd_halt_waves.c
@@ -36,21 +36,22 @@ int umr_sq_cmd_halt_waves(struct umr_asic *asic, enum 
umr_sq_cmd_halt_resume mod
uint32_t value;
uint64_t addr;
struct {
uint32_t se, sh, instance, use_grbm;
} grbm;
 
// SQ_CMD is not present on SI
if (asic->family == FAMILY_SI)
return 0;
 
-   reg = umr_find_reg_data_by_ip_by_instance(asic, "gfx", 
asic->options.vm_partition, "mmSQ_CMD");
+   reg = umr_find_reg_data_by_ip_by_instance(asic, "gfx", 
asic->options.vm_partition,
+ asic->family >= FAMILY_GFX11 
? "regSQ_CMD" : "mmSQ_CMD");
if (!reg) {
asic->err_msg("[BUG]: Cannot find SQ_CMD register in 
umr_sq_cmd_halt_waves()\n");
return -1;
}
 
// compose value
if (asic->family == FAMILY_CIK) {
value = umr_bitslice_compose_value(asic, reg, "CMD", mode == 
UMR_SQ_CMD_HALT ? 1 : 2); // SETHALT
} else {
value = umr_bitslice_compose_value(asic, reg, "CMD", 1); // 
SETHALT
-- 
2.40.0



[PATCH umr 00/17] Various fixes and features for shader debugging on gfx11

2023-06-06 Thread Nicolai Hähnle
Hi,

This series does a bunch of things that I found necessary and useful for
shader debugging on gfx11:

* Fix a bunch of trivial gfx11/Navi3-specific issues like misidentifying the
  ASIC family
* Enable wave scanning on gfx11 and fix some related issues
* Cleanup a bunch of things around the "waves" panel in the GUI, e.g. fix
  the behavior of the waves tree view when waves disappear
* Add the ability to single-step a chosen wave

Thanks,
Nicolai




Re: [PATCH 5/6] drm/amdgpu: add timeline support in amdgpu CS

2018-09-26 Thread Nicolai Hähnle

Hey Chunming,

On 20.09.2018 13:03, Chunming Zhou wrote:

@@ -1113,48 +1117,91 @@ static int amdgpu_syncobj_lookup_and_add_to_sync(struct 
amdgpu_cs_parser *p,
  }
  
  static int amdgpu_cs_process_syncobj_in_dep(struct amdgpu_cs_parser *p,

-   struct amdgpu_cs_chunk *chunk)
+   struct amdgpu_cs_chunk *chunk,
+   bool timeline)
  {
unsigned num_deps;
int i, r;
-   struct drm_amdgpu_cs_chunk_sem *deps;
  
-	deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;

-   num_deps = chunk->length_dw * 4 /
-   sizeof(struct drm_amdgpu_cs_chunk_sem);
+   if (!timeline) {
+   struct drm_amdgpu_cs_chunk_sem *deps;
  
-	for (i = 0; i < num_deps; ++i) {

-   r = amdgpu_syncobj_lookup_and_add_to_sync(p, deps[i].handle);
+   deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
+   num_deps = chunk->length_dw * 4 /
+   sizeof(struct drm_amdgpu_cs_chunk_sem);
+   for (i = 0; i < num_deps; ++i) {
+   r = amdgpu_syncobj_lookup_and_add_to_sync(p, 
deps[i].handle,
+ 0, 0);
if (r)
return r;


The indentation looks wrong.



+   }
+   } else {
+   struct drm_amdgpu_cs_chunk_syncobj *syncobj_deps;
+
+   syncobj_deps = (struct drm_amdgpu_cs_chunk_syncobj 
*)chunk->kdata;
+   num_deps = chunk->length_dw * 4 /
+   sizeof(struct drm_amdgpu_cs_chunk_syncobj);
+   for (i = 0; i < num_deps; ++i) {
+   r = amdgpu_syncobj_lookup_and_add_to_sync(p, 
syncobj_deps[i].handle,
+ 
syncobj_deps[i].point,
+ 
syncobj_deps[i].flags);
+   if (r)
+   return r;


Here as well.

So I'm wondering a bit about this uapi. Specifically, what happens if 
you try to use timeline syncobjs here as dependencies _without_ 
DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT?


My understanding is, it'll just return -EINVAL without any indication as 
to which syncobj actually failed. What's the caller supposed to do then?


Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/6] drm/amdgpu: add timeline support in amdgpu CS

2018-09-26 Thread Nicolai Hähnle

  static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 1ceec56de015..412359b446f1 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -517,6 +517,8 @@ struct drm_amdgpu_gem_va {
  #define AMDGPU_CHUNK_ID_SYNCOBJ_IN  0x04
  #define AMDGPU_CHUNK_ID_SYNCOBJ_OUT 0x05
  #define AMDGPU_CHUNK_ID_BO_HANDLES  0x06
+#define AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT0x07
+#define AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL  0x08
  
  struct drm_amdgpu_cs_chunk {

__u32   chunk_id;
@@ -592,6 +594,14 @@ struct drm_amdgpu_cs_chunk_sem {
__u32 handle;
  };
  
+struct drm_amdgpu_cs_chunk_syncobj {

+   __u32 handle;
+   __u32 pad;
+   __u64 point;
+   __u64 flags;
+};


Sure it's nice to be forward-looking, but can't we just put the flags 
into the padding?


Cheers,
Nicolai



+
+
  #define AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ0
  #define AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ_FD 1
  #define AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD   2




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: fix user fence write race condition

2018-06-29 Thread Nicolai Hähnle
From: Nicolai Hähnle 

The buffer object backing the user fence is reserved using the non-user
fence, i.e., as soon as the non-user fence is signaled, the user fence
buffer object can be moved or even destroyed.

Therefore, emit the user fence first.

Both fences have the same cache invalidation behavior, so this should
have no user-visible effect.

Signed-off-by: Nicolai Hähnle 
---
There is one aspect to this change that I'm a bit unsure about: what does
insert_end do? It's only used by UVD & friends, and since those rings
don't use user fences I guess this patch doesn't really change anything
for them. And having the insert_end between those fences always looked
a bit suspicious...
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 829c4d2a33b9..8117b8c2113e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -227,38 +227,38 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
if (ring->funcs->emit_tmz)
amdgpu_ring_emit_tmz(ring, false);
 
if (ring->funcs->emit_hdp_invalidate
 #ifdef CONFIG_X86_64
&& !(adev->flags & AMD_IS_APU)
 #endif
   )
amdgpu_ring_emit_hdp_invalidate(ring);
 
+   /* wrap the last IB with fence */
+   if (job && job->uf_addr) {
+   amdgpu_ring_emit_fence(ring, job->uf_addr, job->uf_sequence,
+  AMDGPU_FENCE_FLAG_64BIT);
+   }
+
r = amdgpu_fence_emit(ring, f);
if (r) {
dev_err(adev->dev, "failed to emit fence (%d)\n", r);
if (job && job->vmid)
amdgpu_vmid_reset(adev, ring->funcs->vmhub, job->vmid);
amdgpu_ring_undo(ring);
return r;
}
 
if (ring->funcs->insert_end)
ring->funcs->insert_end(ring);
 
-   /* wrap the last IB with fence */
-   if (job && job->uf_addr) {
-   amdgpu_ring_emit_fence(ring, job->uf_addr, job->uf_sequence,
-  AMDGPU_FENCE_FLAG_64BIT);
-   }
-
if (patch_offset != ~0 && ring->funcs->patch_cond_exec)
amdgpu_ring_patch_cond_exec(ring, patch_offset);
 
ring->current_ctx = fence_ctx;
if (vm && ring->funcs->emit_switch_buffer)
amdgpu_ring_emit_switch_buffer(ring);
amdgpu_ring_commit(ring);
return 0;
 }
 
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 3/3] drm/amdgpu: Add plumbing for handling SQ EDC/ECC interrupts v2.

2018-06-11 Thread Nicolai Hähnle

Looks good to me. Series:

Reviewed-by: Nicolai Hähnle 

On 08.06.2018 00:19, Andrey Grodzovsky wrote:

From: David Panariti 

SQ can generate interrupts and installs the ISR to
handle the SQ interrupts.

Add parsing SQ data in interrupt handler.

v2:
Remove CZ only limitation.
Rebase.

Signed-off-by: David Panariti 
Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 109 +-
  1 file changed, 108 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index b96dc08..9e6f4f2 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -2054,6 +2054,14 @@ static int gfx_v8_0_sw_init(void *handle)
if (r)
return r;
  
+	/* SQ interrupts. */

+   r = amdgpu_irq_add_id(adev, AMDGPU_IH_CLIENTID_LEGACY, 239,
+ &adev->gfx.sq_irq);
+   if (r) {
+   DRM_ERROR("amdgpu_irq_add() for SQ failed: %d\n", r);
+   return r;
+   }
+
adev->gfx.gfx_current_status = AMDGPU_GFX_NORMAL_MODE;
  
  	gfx_v8_0_scratch_init(adev);

@@ -5119,6 +5127,8 @@ static int gfx_v8_0_hw_fini(void *handle)
  
  	amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
  
+	amdgpu_irq_put(adev, &adev->gfx.sq_irq, 0);

+
/* disable KCQ to avoid CPC touch memory not valid anymore */
for (i = 0; i < adev->gfx.num_compute_rings; i++)
gfx_v8_0_kcq_disable(&adev->gfx.kiq.ring, 
&adev->gfx.compute_ring[i]);
@@ -5556,6 +5566,14 @@ static int gfx_v8_0_late_init(void *handle)
return r;
}
  
+	r = amdgpu_irq_get(adev, &adev->gfx.sq_irq, 0);

+   if (r) {
+   DRM_ERROR(
+   "amdgpu_irq_get() failed to get IRQ for SQ, r: %d.\n",
+   r);
+   return r;
+   }
+
amdgpu_device_ip_set_powergating_state(adev,
   AMD_IP_BLOCK_TYPE_GFX,
   AMD_PG_STATE_GATE);
@@ -6846,6 +6864,32 @@ static int gfx_v8_0_set_cp_ecc_int_state(struct 
amdgpu_device *adev,
return 0;
  }
  
+static int gfx_v8_0_set_sq_int_state(struct amdgpu_device *adev,

+struct amdgpu_irq_src *source,
+unsigned int type,
+enum amdgpu_interrupt_state state)
+{
+   int enable_flag;
+
+   switch (state) {
+   case AMDGPU_IRQ_STATE_DISABLE:
+   enable_flag = 1;
+   break;
+
+   case AMDGPU_IRQ_STATE_ENABLE:
+   enable_flag = 0;
+   break;
+
+   default:
+   return -EINVAL;
+   }
+
+   WREG32_FIELD(SQ_INTERRUPT_MSG_CTRL, STALL,
+enable_flag);
+
+   return 0;
+}
+
  static int gfx_v8_0_eop_irq(struct amdgpu_device *adev,
struct amdgpu_irq_src *source,
struct amdgpu_iv_entry *entry)
@@ -6900,7 +6944,62 @@ static int gfx_v8_0_cp_ecc_error_irq(struct 
amdgpu_device *adev,
 struct amdgpu_irq_src *source,
 struct amdgpu_iv_entry *entry)
  {
-   DRM_ERROR("ECC error detected.");
+   DRM_ERROR("CP EDC/ECC error detected.");
+   return 0;
+}
+
+static int gfx_v8_0_sq_irq(struct amdgpu_device *adev,
+  struct amdgpu_irq_src *source,
+  struct amdgpu_iv_entry *entry)
+{
+   u8 enc, se_id;
+   char type[20];
+
+   /* Parse all fields according to SQ_INTERRUPT* registers */
+   enc = (entry->src_data[0] >> 26) & 0x3;
+   se_id = (entry->src_data[0] >> 24) & 0x3;
+
+   switch (enc) {
+   case 0:
+   DRM_INFO("SQ general purpose intr detected:"
+   "se_id %d, immed_overflow %d, 
host_reg_overflow %d,"
+   "host_cmd_overflow %d, cmd_timestamp 
%d,"
+   "reg_timestamp %d, thread_trace_buff_full 
%d,"
+   "wlt %d, thread_trace %d.\n",
+   se_id,
+   (entry->src_data[0] >> 7) & 0x1,
+   (entry->src_data[0] >> 6) & 0x1,
+   (entry->src_data[0] >> 5) & 0x1,
+   (entry->src_data[0] >> 4) & 0x1,
+   (entry->src_data[0] >> 3) & 0x1,
+   (entry->src_data[0] >> 2) 

Re: [PATCH 3/3] drm/amdgpu: Add plumbing for handling SQ EDC/ECC interrupts.

2018-06-05 Thread Nicolai Hähnle

On 05.06.2018 15:17, Andrey Grodzovsky wrote:

From: David Panariti 

SQ can generate interrupts and installs the ISR to
handle the SQ interrupts.

Add parsing SQ data in interrupt handler.

Signed-off-by: David Panariti 
Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 110 +-
  1 file changed, 109 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index a19fcc6..c4a2c3d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -2058,6 +2058,15 @@ static int gfx_v8_0_sw_init(void *handle)
}
}
  
+	/* SQ interrupts. */

+   /* @todo XXX is this CZ only? */


The SQ interrupt source in general is the same on all GCN as far as I know.

EDC/ECC (which is the context for this change) is only available on some 
chips, and I don't remember off the top of my head which ones, but it 
really doesn't matter for this commit: we can install the IRQ on all 
chips, and the EDC/ECC path simply never triggers for non-ECC enabled 
hardware. There are still the other potential IRQ causes for SQ.


Cheers,
Nicolai




+   r = amdgpu_irq_add_id(adev, AMDGPU_IH_CLIENTID_LEGACY, 239,
+ &adev->gfx.sq_irq);
+   if (r) {
+   DRM_ERROR("amdgpu_irq_add() for SQ failed: %d\n", r);
+   return r;
+   }
+
adev->gfx.gfx_current_status = AMDGPU_GFX_NORMAL_MODE;
  
  	gfx_v8_0_scratch_init(adev);

@@ -5122,6 +5131,8 @@ static int gfx_v8_0_hw_fini(void *handle)
amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
if (adev->asic_type == CHIP_CARRIZO)
amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
+   /* @todo XXX Is this CZ only? */
+   amdgpu_irq_put(adev, &adev->gfx.sq_irq, 0);
  
  	/* disable KCQ to avoid CPC touch memory not valid anymore */

for (i = 0; i < adev->gfx.num_compute_rings; i++)
@@ -5561,6 +5572,14 @@ static int gfx_v8_0_late_init(void *handle)
return r;
}
}
+   /* @todo XXX Is this CZ only? */
+   r = amdgpu_irq_get(adev, &adev->gfx.sq_irq, 0);
+   if (r) {
+   DRM_ERROR(
+   "amdgpu_irq_get() failed to get IRQ for SQ, r: %d.\n",
+   r);
+   return r;
+   }
  
  	amdgpu_device_ip_set_powergating_state(adev,

   AMD_IP_BLOCK_TYPE_GFX,
@@ -6852,6 +6871,32 @@ static int gfx_v8_0_set_cp_ecc_int_state(struct 
amdgpu_device *adev,
return 0;
  }
  
+static int gfx_v8_0_set_sq_int_state(struct amdgpu_device *adev,

+struct amdgpu_irq_src *source,
+unsigned int type,
+enum amdgpu_interrupt_state state)
+{
+   int enable_flag;
+
+   switch (state) {
+   case AMDGPU_IRQ_STATE_DISABLE:
+   enable_flag = 1;
+   break;
+
+   case AMDGPU_IRQ_STATE_ENABLE:
+   enable_flag = 0;
+   break;
+
+   default:
+   return -EINVAL;
+   }
+
+   WREG32_FIELD(SQ_INTERRUPT_MSG_CTRL, STALL,
+enable_flag);
+
+   return 0;
+}
+
  static int gfx_v8_0_eop_irq(struct amdgpu_device *adev,
struct amdgpu_irq_src *source,
struct amdgpu_iv_entry *entry)
@@ -6906,7 +6951,62 @@ static int gfx_v8_0_cp_ecc_error_irq(struct 
amdgpu_device *adev,
 struct amdgpu_irq_src *source,
 struct amdgpu_iv_entry *entry)
  {
-   DRM_ERROR("ECC error detected.");
+   DRM_ERROR("CP EDC/ECC error detected.");
+   return 0;
+}
+
+static int gfx_v8_0_sq_irq(struct amdgpu_device *adev,
+  struct amdgpu_irq_src *source,
+  struct amdgpu_iv_entry *entry)
+{
+   u8 enc, se_id;
+   char type[20];
+
+   /* Parse all fields according to SQ_INTERRUPT* registers */
+   enc = (entry->src_data[0] >> 26) & 0x3;
+   se_id = (entry->src_data[0] >> 24) & 0x3;
+
+   switch (enc) {
+   case 0:
+   DRM_INFO("SQ general purpose intr detected:"
+   "se_id %d, immed_overflow %d, 
host_reg_overflow %d,"
+   "host_cmd_overflow %d, cmd_timestamp 
%d,"
+   "reg_timestamp %d, thread_trace_buff_full 
%d,"
+   "wlt %d, thread_trace %d.\n",
+   se_id,
+   (entry->src_data[0] >> 7) & 0x1,
+   (entry->src_data[0] >> 6) & 0x1,
+   (entry->src_data[0] >> 5) & 0x1,
+  

[PATCH] drm/amdgpu: set COMPUTE_PGM_RSRC1 for SGPR/VGPR clearing shaders

2018-04-23 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Otherwise, the SQ may skip some of the register writes, or shader waves may
be allocated where we don't expect them, so that as a result we don't actually
reset all of the register SRAMs. This can lead to spurious ECC errors later on
if a shader uses an uninitialized register.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=198883
Signed-off-by: Nicolai Hähnle 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index a2d77bcf9a78..bdce864ab8fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -1452,64 +1452,67 @@ static const u32 sgpr_init_compute_shader[] =
0xbee60004, 0xbee70005,
0xbeea0006, 0xbeeb0007,
0xbee80008, 0xbee90009,
0xbefc, 0xbf8a,
0xbf81, 0x,
 };
 
 static const u32 vgpr_init_regs[] =
 {
mmCOMPUTE_STATIC_THREAD_MGMT_SE0, 0x,
-   mmCOMPUTE_RESOURCE_LIMITS, 0,
+   mmCOMPUTE_RESOURCE_LIMITS, 0x100, /* CU_GROUP_COUNT=1 */
mmCOMPUTE_NUM_THREAD_X, 256*4,
mmCOMPUTE_NUM_THREAD_Y, 1,
mmCOMPUTE_NUM_THREAD_Z, 1,
+   mmCOMPUTE_PGM_RSRC1, 0x14f, /* VGPRS=15 (64 logical VGPRs), SGPRS=1 
(16 SGPRs), BULKY=1 */
mmCOMPUTE_PGM_RSRC2, 20,
mmCOMPUTE_USER_DATA_0, 0xedcedc00,
mmCOMPUTE_USER_DATA_1, 0xedcedc01,
mmCOMPUTE_USER_DATA_2, 0xedcedc02,
mmCOMPUTE_USER_DATA_3, 0xedcedc03,
mmCOMPUTE_USER_DATA_4, 0xedcedc04,
mmCOMPUTE_USER_DATA_5, 0xedcedc05,
mmCOMPUTE_USER_DATA_6, 0xedcedc06,
mmCOMPUTE_USER_DATA_7, 0xedcedc07,
mmCOMPUTE_USER_DATA_8, 0xedcedc08,
mmCOMPUTE_USER_DATA_9, 0xedcedc09,
 };
 
 static const u32 sgpr1_init_regs[] =
 {
mmCOMPUTE_STATIC_THREAD_MGMT_SE0, 0x0f,
-   mmCOMPUTE_RESOURCE_LIMITS, 0x100,
+   mmCOMPUTE_RESOURCE_LIMITS, 0x100, /* CU_GROUP_COUNT=1 */
mmCOMPUTE_NUM_THREAD_X, 256*5,
mmCOMPUTE_NUM_THREAD_Y, 1,
mmCOMPUTE_NUM_THREAD_Z, 1,
+   mmCOMPUTE_PGM_RSRC1, 0x240, /* SGPRS=9 (80 GPRS) */
mmCOMPUTE_PGM_RSRC2, 20,
mmCOMPUTE_USER_DATA_0, 0xedcedc00,
mmCOMPUTE_USER_DATA_1, 0xedcedc01,
mmCOMPUTE_USER_DATA_2, 0xedcedc02,
mmCOMPUTE_USER_DATA_3, 0xedcedc03,
mmCOMPUTE_USER_DATA_4, 0xedcedc04,
mmCOMPUTE_USER_DATA_5, 0xedcedc05,
mmCOMPUTE_USER_DATA_6, 0xedcedc06,
mmCOMPUTE_USER_DATA_7, 0xedcedc07,
mmCOMPUTE_USER_DATA_8, 0xedcedc08,
mmCOMPUTE_USER_DATA_9, 0xedcedc09,
 };
 
 static const u32 sgpr2_init_regs[] =
 {
mmCOMPUTE_STATIC_THREAD_MGMT_SE0, 0xf0,
mmCOMPUTE_RESOURCE_LIMITS, 0x100,
mmCOMPUTE_NUM_THREAD_X, 256*5,
mmCOMPUTE_NUM_THREAD_Y, 1,
mmCOMPUTE_NUM_THREAD_Z, 1,
+   mmCOMPUTE_PGM_RSRC1, 0x240, /* SGPRS=9 (80 GPRS) */
mmCOMPUTE_PGM_RSRC2, 20,
mmCOMPUTE_USER_DATA_0, 0xedcedc00,
mmCOMPUTE_USER_DATA_1, 0xedcedc01,
mmCOMPUTE_USER_DATA_2, 0xedcedc02,
mmCOMPUTE_USER_DATA_3, 0xedcedc03,
mmCOMPUTE_USER_DATA_4, 0xedcedc04,
mmCOMPUTE_USER_DATA_5, 0xedcedc05,
mmCOMPUTE_USER_DATA_6, 0xedcedc06,
mmCOMPUTE_USER_DATA_7, 0xedcedc07,
mmCOMPUTE_USER_DATA_8, 0xedcedc08,
-- 
2.14.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Raven Ridge Ryzen 2500U hang reproduced

2018-04-16 Thread Nicolai Hähnle

On 14.04.2018 00:24, Bráulio Bhavamitra wrote:
It ALWAYS crashes on shader15 of 
http://www.graphicsfuzz.com/benchmark/android-v1.html.


This is very likely an unrelated issue to any kind of desktop hang 
you're seeing. The graphics fuzz shaders are the result of fuzzing to 
intentionally generate unusual control flow structures which are likely 
to trigger shader compiler bugs. Typical desktop workloads don't have 
such shaders, so any generic desktop hang you're seeing is almost 
certainly unrelated.


Cheers,
Nicolai




Also reported at https://bugzilla.redhat.com/show_bug.cgi?id=1562530

Using kernel 4.16 with options rcu_nocb=0-15 and amdgpu.dpm=0

Cheers,
Bráulio

On Mon, Mar 26, 2018 at 8:30 PM Bráulio Bhavamitra > wrote:


Hi all,

Following the random crashes happenning with many users (e.g.

https://www.phoronix.com/scan.php?page=news_item&px=Raven-Ridge-March-Update),
not only on Linux but also Windows, I've been struggling to
reproduce and generate any error log.

After discovering that the error only happenned with KDE and games
(at least for me, see https://bugs.kde.org/show_bug.cgi?id=392378),
I could reproduce after a failing suspend.

The crash most of the times allows the mouse to keep moving, but
anything else works. Except for this time the keyboard worked so I
could switch the tty and save the dmesg messages. After this I had
to force reboot as it got stuck trying to kill the lightdm service
(gpu hanged?).

The errors are, see attached the full dmesg:
[ 2899.525650] amdgpu :03:00.0: couldn't schedule ib on ring 
[ 2899.525769] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
scheduling IBs (-22)
[ 2909.125047] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, last signaled seq=174624, last emitted seq=174627
[ 2909.125060] [drm] IP block:psp is hung!
[ 2909.125063] [drm] GPU recovery disabled.
[ 2914.756931] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR*
amdgpu_cs_list_validate(validated) failed.
[ 2914.756997] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
process the buffer list -16!
[ 2914.997372] amdgpu :03:00.0: couldn't schedule ib on ring 
[ 2914.997498] [drm:amdgpu_job_run [amdgpu]] *ERROR* Error
scheduling IBs (-22)
[ 2930.117275] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR*
amdgpu_cs_list_validate(validated) failed.
[ 2930.117405] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
process the buffer list -16!
[ 2930.152015] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to
clear memory with ring turned off.
[ 2930.157940] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to
clear memory with ring turned off.
[ 2930.180535] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to
clear memory with ring turned off.
[ 2933.781692] IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0: link becomes ready
[ 2945.477205] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR*
amdgpu_cs_list_validate(validated) failed.
[ 2945.477348] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
process the buffer list -16!

System details:
HP Envy x360 Ryzen 2500U
ArchLinux, kernel 4.16rc6 and 4.15.12

Cheers,
bráulio



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-12 Thread Nicolai Hähnle

On 12.04.2018 01:30, Cyr, Aric wrote:

At least with VDPAU, video players are already explicitly specifying the
target presentation time, so no changes should be required at that
level. Don't know about other video APIs.

The X11 Present extension protocol is also prepared for specifying the
target presentation time already, the support for it just needs to be
implemented.


I'm perfectly OK with presentation time-based *API*.  I get it from a user 
mode/app perspective, and that's fine.  We need that feedback and would like 
help defining that portions of the stack.
However, I think it doesn't make as much sense as a *DDI* because it doesn't 
correspond to any hardware real or logical (i.e. no one would implement it in 
HW this way) and the industry specs aren't defined that way.
You can have libdrm or some other usermode component translate your presentation time 
into a frame duration and schedule it.  What's the advantage of having this in kernel 
besides the fact we lose the intent of the application and could prevent features and 
optimizations.  When it gets to kernel, I think it is much more elegant for the flip 
structure to contain a simple duration that says "hey, show this frame on the screen 
for this long".  Then we don't need any clocks or timers just some simple math and 
program the hardware.


There isn't necessarily an inherent advantage to having this translation 
in the kernel. However, we *must* do this translation in a place that is 
owned by display experts (i.e., you guys), because only you guys know 
how to actually do that translation reliably and correctly.


Since your work is currently limited to the kernel, it makes sense to do 
it in the kernel.


If the translation doesn't happen in a place that you feel comfortable 
working on, we're setting ourselves up for a future where this 
hypothetical future UMD component will get this wrong, and there'll be a 
lot of finger-pointing between you guys and whoever writes that UMD, 
with likely little willingness to actually go into the respective other 
codebase to fix what's wrong. And that's a pretty sucky future.


Cheers,
Nicolai

P.S.: I'm also a little surprised that you seem to be saying that 
requesting a target present time is basically impossible (at least, 
that's kind of implied by your statement about mGPUs), and yet there's 
precedent for such APIs in both Vulkan and VDPAU.





In short,
  1) We can simplify media players' lives by helping them get really, really 
close to their content rate, so they wouldn't need any frame rate conversion.
  They'll still need A/V syncing though, and variable refresh cannot solve 
this and thus is way out of scope of what we're proposing.

  2) For gaming, don't even try to guess a frame duration, the driver/hardware 
will do a better job every time, just specify duration=0 and flip as fast as 
you can.

Regards,
   Aric

P.S. Thanks for the Croteam link.  Interesting, but basically nullified by 
variable refresh rate displays.  You won't have 
stuttering/microstuttering/juddering/tearing if your display's refresh rate 
matches the render/present rate of the game.  Maybe I should grab The Talos 
Principle to see how well it works with FreeSync display :)

--
ARIC CYR
PMTS Software Engineer | SW – Display Technologies






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Nicolai Hähnle

On 10.04.2018 23:45, Cyr, Aric wrote:

For video games we have a similar situation where a frame is rendered
for a certain world time and in the ideal case we would actually
display the frame at this world time.


That seems like it would be a poorly written game that flips like
that, unless they are explicitly trying to throttle the framerate for
some reason.  When a game presents a completed frame, they’d like
that to happen as soon as possible.


What you're describing is what most games have been doing traditionally.
Croteam's research shows that this results in micro-stuttering, because
frames may be presented too early. To avoid that, they want to
explicitly time each presentation as described by Christian.


Yes, I agree completely.  However that's only truly relevant for fixed
refreshed rate displays.


No, it also affects variable refresh; possibly even more in some cases,
because the presentation time is less predictable.


Yes, and that's why you don't want to do it when you have variable refresh.  
The hardware in the monitor and GPU will do it for you,

so why bother?

I think Michel's point is that the monitor and GPU hardware *cannot*
really do this, because there's synchronization with audio to take into
account, which the GPU or monitor don't know about.


How does it work fine today given that all kernel seems to know is 'current' or 
'current+1' vsyncs.
Presumably the applications somehow schedule all this just fine.
If this works without variable refresh for 60Hz, will it not work for a fixed-rate 
"48Hz" monitor (assuming a 24Hz video)?


You're right. I guess a better way to state the point is that it 
*doesn't* really work today with fixed refresh, but if we're going to 
introduce a new API, then why not do so in a way that can fix these 
additional problems as well?




Also, as I wrote separately, there's the case of synchronizing multiple
monitors.


For multimonitor to work with VRR, they'll have to be timing and flip 
synchronized.
This is impossible for an application to manage, it needs driver/HW control or 
you end up with one display flipping before the other and it looks terrible.
And definitely forget about multiGPU without professional workstation-type 
support needed to sync the displays across adapters.


I'm not a display expert, but I find it hard to believe that it's that 
difficult. Perhaps you can help us understand?


Say you have a multi-GPU system, and each GPU has multiple displays 
attached, and a single application is driving them all. The application 
queues flips for all displays with the same target_present_time_ns 
attribute. Starting at some time T, the application simply asks for the 
same present time T + i * 1667 (or whatever) for frame i from all 
displays.


Of course it's to be expected that some (or all) of the displays will 
not be able to hit the target time on the first bunch of flips due to 
hardware limitations, but as long as the range of supported frame times 
is wide enough, I'd expect all of them to drift towards presenting at 
the correct time eventually, even across multiple GPUs, with this simple 
scheme.


Why would that not work to sync up all displays almost perfectly?


[snip]

Are there any real problems with exposing an absolute target present time?


Realistically, how far into the future are you requesting a presentation time? 
Won't it almost always be something like current_time+1000/video_frame_rate?
If so, why not just tell the driver to set 1000/video_frame_rate and have the 
GPU/monitor create nicely spaced VSYNCs for you that match the source content?

In fact, you probably wouldn't even need to change your video player at all, 
other than having it pass the target_frame_duration_ns.  You could consider 
this a 'hint' as you suggested, since it's cannot be guaranteed in cases your 
driver or HW doesn't support variable refresh.  If the target_frame_duration_ns 
hint is supported/applied, then the video app should have nothing extra to do 
that it wouldn't already do for any arbitrary fixed-refresh rate display.  If 
not supported (say the drm_atomic_check fails with -EINVAL or something), the 
video app falls can stop requesting a fixed target_frame_duration_ns.

A fundamental problem I have with a target present time though is how to accommodate 
present times that are larger than one VSYNC time?  If my monitor has a 40Hz-60Hz 
variable refresh, it's easy to translate "my content is 24Hz, repeat this next frame 
an integer multiple number of times so that it lands within the monitor range".  
Driver fixes display to an even 48Hz and everything good (no worse than a 30Hz clip on a 
traditional 60Hz display anyways).  This frame-doubling is all hardware based and doesn't 
require any polling.

Now if you change that to "show my content in at least X nanoseconds" it can work on all 
displays, but the intent of the app is gone and driver/GPU/display cannot optimize.  For example, 
the HDMI VRR spec defines a "CinemaVRR

Re: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Nicolai Hähnle

On 10.04.2018 19:25, Cyr, Aric wrote:

-Original Message-
From: Michel Dänzer [mailto:mic...@daenzer.net]
Sent: Tuesday, April 10, 2018 13:16

On 2018-04-10 07:13 PM, Cyr, Aric wrote:

-Original Message-
From: Michel Dänzer [mailto:mic...@daenzer.net]
Sent: Tuesday, April 10, 2018 13:06
On 2018-04-10 06:26 PM, Cyr, Aric wrote:

From: Koenig, Christian Sent: Tuesday, April 10, 2018 11:43


For video games we have a similar situation where a frame is rendered
for a certain world time and in the ideal case we would actually
display the frame at this world time.


That seems like it would be a poorly written game that flips like
that, unless they are explicitly trying to throttle the framerate for
some reason.  When a game presents a completed frame, they’d like
that to happen as soon as possible.


What you're describing is what most games have been doing traditionally.
Croteam's research shows that this results in micro-stuttering, because
frames may be presented too early. To avoid that, they want to
explicitly time each presentation as described by Christian.


Yes, I agree completely.  However that's only truly relevant for fixed
refreshed rate displays.


No, it also affects variable refresh; possibly even more in some cases,
because the presentation time is less predictable.


Yes, and that's why you don't want to do it when you have variable refresh.  
The hardware in the monitor and GPU will do it for you, so why bother?


I think Michel's point is that the monitor and GPU hardware *cannot* 
really do this, because there's synchronization with audio to take into 
account, which the GPU or monitor don't know about.


Also, as I wrote separately, there's the case of synchronizing multiple 
monitors.




The input to their algorithms will be noisy causing worst estimations.  If you 
just present as fast as you can, it'll just work (within reason).
The majority of gamers want maximum FPS for their games, and there's quite 
frequently outrage at a particular game when they are limited to something 
lower that what their monitor could otherwise support (i.e. I don't want my 
game limited to 30Hz if I have a shiny 144Hz gaming display I paid good money 
for).   Of course, there's always exceptions... but in our experience those are 
few and far between.


I agree that games most likely shouldn't try to be smart. I'm curious 
about the Croteam findings, but even if they did a really clever thing 
that works better than just telling the display driver "display ASAP 
please", chances are that *most* developers won't do that. And they'll 
most likely get it wrong, so our guidance should really be "games should 
ask for ASAP presentation, and nothing else".


However, there *are* legitimate use cases for requesting a specific 
presentation time, and there *is* precedent of APIs that expose such 
features.


Are there any real problems with exposing an absolute target present time?

Cheers,
Nicolai

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: RFC for a render API to support adaptive sync and VRR

2018-04-10 Thread Nicolai Hähnle

On 10.04.2018 18:26, Cyr, Aric wrote:
That presentation time doesn’t need to come to kernel as such and 
actually is fine as-is completely decoupled from adaptive sync.  As long 
as the video player provides the new target_frame_duration_ns on the 
flip, then the driver/HW will target the correct refresh rate to match 
the source content.  This simply means that more often than not the 
video presents will  align very close to the monitor’s refresh rate, 
resulting in a smooth video experience.  For example, if you have 24Hz 
content, and an adaptive sync monitor with a range of 40-60Hz, once the 
target_frame_duration_ns is provided, driver can configure the monitor 
to a fixed refresh rate of 48Hz causing all video presents to be 
frame-doubled in hardware without further application intervention.


What about multi-monitor displays, where you want to play an animation 
that spans multiple monitors. You really want all monitors to flip at 
the same time.


I understand where you're coming from, but the perspective of refusing a 
target presentation time is a rather selfish one of "we're the display, 
we're the most important, everybody else has to adjust to us" (e.g. to 
get perfect sync between video and audio). I admit I'm phrasing it in a 
bit of an extreme way, but perhaps this phrasing helps to see why that's 
just not a very good attitude to have.


All devices (whether video or audio or whatever) should be able to 
receive a target presentation time.


If the application can make your life a bit easier by providing the 
targetted refresh rate as additional *hint-only* parameter (like in your 
24 Hz --> 48 Hz doubling example), then maybe we should indeed consider 
that.


Cheers,
Nicolai





For video games we have a similar situation where a frame is rendered 
for a certain world time and in the ideal case we would actually display 
the frame at this world time.


That seems like it would be a poorly written game that flips like that, 
unless they are explicitly trying to throttle the framerate for some 
reason.  When a game presents a completed frame, they’d like that to 
happen as soon as possible.  This is why non-VSYNC modes of flipping 
exist and many games leverage this.  Adaptive sync gives you the lower 
latency of immediate flips without the tearing imposed by using 
non-VSYNC flipping.



I mean we have the guys from Valve on this mailing list so I think we 
should just get the feedback from them and see what they prefer.


We have thousands of Steam games on other OSes that work great already, 
but we’d certainly be interested in any additional feedback.  My guess 
is they prefer to “do nothing” and let driver/HW manage it, otherwise 
you exempt all existing games from supporting adaptive sync without a 
rewrite or update.



Regards,
Christian.


-Aric



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu/gfx9: cache DB_DEBUG2 and make it available to userspace

2018-04-10 Thread Nicolai Hähnle

Thanks!

Acked-by: Nicolai Hähnle 


On 10.04.2018 17:18, Alex Deucher wrote:

Userspace needs to query this value to work around a hw bug in
certain cases.

Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   | 2 ++
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 1 +
  drivers/gpu/drm/amd/amdgpu/soc15.c| 3 +++
  3 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index ed5c22bfa3e5..09fa37e9a840 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -867,6 +867,8 @@ struct amdgpu_gfx_config {
  
  	/* gfx configure feature */

uint32_t double_offchip_lds_buf;
+   /* cached value of DB_DEBUG2 */
+   uint32_t db_debug2;
  };
  
  struct amdgpu_cu_info {

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 9d39fd5b1822..66bd6c1c82c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1600,6 +1600,7 @@ static void gfx_v9_0_gpu_init(struct amdgpu_device *adev)
  
  	gfx_v9_0_setup_rb(adev);

gfx_v9_0_get_cu_info(adev, &adev->gfx.cu_info);
+   adev->gfx.config.db_debug2 = RREG32_SOC15(GC, 0, mmDB_DEBUG2);
  
  	/* XXX SH_MEM regs */

/* where to put LDS, scratch, GPUVM in FSA64 space */
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 2e9ebe8db5cc..65e781f05c24 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -287,6 +287,7 @@ static struct soc15_allowed_register_entry 
soc15_allowed_read_registers[] = {
{ SOC15_REG_ENTRY(GC, 0, mmCP_CPC_STALLED_STAT1)},
{ SOC15_REG_ENTRY(GC, 0, mmCP_CPC_STATUS)},
{ SOC15_REG_ENTRY(GC, 0, mmGB_ADDR_CONFIG)},
+   { SOC15_REG_ENTRY(GC, 0, mmDB_DEBUG2)},
  };
  
  static uint32_t soc15_read_indexed_register(struct amdgpu_device *adev, u32 se_num,

@@ -315,6 +316,8 @@ static uint32_t soc15_get_register_value(struct 
amdgpu_device *adev,
} else {
if (reg_offset == SOC15_REG_OFFSET(GC, 0, mmGB_ADDR_CONFIG))
return adev->gfx.config.gb_addr_config;
+   else if (reg_offset == SOC15_REG_OFFSET(GC, 0, mmDB_DEBUG2))
+   return adev->gfx.config.db_debug2;
return RREG32(reg_offset);
}
  }




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Add SQ_WAVE_TTMPx support for gfx9

2018-04-07 Thread Nicolai Hähnle

I don't think this is the right approach.

TTMPs are really just SGPRs. They're only special because access to them 
is restricted to trap handlers.


Note how the index ixSQ_WAVE_TTMP0 is 0x26c. 0x200 is the base for 
reading SGPRs, and 0x6c is the operand encoding of TTMP0.


I think umr should just use the SGPR read path with the correct index.

Thanks,
Nicolai

On 06.04.2018 16:51, Tom St Denis wrote:

Patches attached for both umr/kernel.

Tested on my Raven1.  I'll circle back to adding gfx8 after lunch.

Tom


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Per file OOM badness

2018-01-30 Thread Nicolai Hähnle

On 30.01.2018 12:34, Michel Dänzer wrote:

On 2018-01-30 12:28 PM, Christian König wrote:

Am 30.01.2018 um 12:02 schrieb Michel Dänzer:

On 2018-01-30 11:40 AM, Christian König wrote:

Am 30.01.2018 um 10:43 schrieb Michel Dänzer:

[SNIP]

Would it be ok to hang onto potentially arbitrary mmget references
essentially forever? If that's ok I think we can do your process based
account (minus a few minor inaccuracies for shared stuff perhaps,
but no
one cares about that).

Honestly, I think you and Christian are overthinking this. Let's try
charging the memory to every process which shares a buffer, and go from
there.

My problem is that this needs to be bullet prove.

For example imagine an application which allocates a lot of BOs, then
calls fork() and let the parent process die. The file descriptor lives
on in the child process, but the memory is not accounted against the
child.

What exactly are you referring to by "the file descriptor" here?


The file descriptor used to identify the connection to the driver. In
other words our drm_file structure in the kernel.


What happens to BO handles in general in this case? If both parent and
child process keep the same handle for the same BO, one of them
destroying the handle will result in the other one not being able to use
it anymore either, won't it?

Correct.

That usage is actually not useful at all, but we already had
applications which did exactly that by accident.

Not to mention that somebody could do it on purpose.


Can we just prevent child processes from using their parent's DRM file
descriptors altogether? Allowing it seems like a bad idea all around.


Existing protocols pass DRM fds between processes though, don't they?

Not child processes perhaps, but special-casing that seems like awful 
design.


Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] Per file OOM badness

2018-01-30 Thread Nicolai Hähnle

On 30.01.2018 11:48, Michel Dänzer wrote:

On 2018-01-30 11:42 AM, Daniel Vetter wrote:

On Tue, Jan 30, 2018 at 10:43:10AM +0100, Michel Dänzer wrote:

On 2018-01-30 10:31 AM, Daniel Vetter wrote:


I guess a good first order approximation would be if we simply charge any
newly allocated buffers to the process that created them, but that means
hanging onto lots of mm_struct pointers since we want to make sure we then
release those pages to the right mm again (since the process that drops
the last ref might be a totally different one, depending upon how the
buffers or DRM fd have been shared).

Would it be ok to hang onto potentially arbitrary mmget references
essentially forever? If that's ok I think we can do your process based
account (minus a few minor inaccuracies for shared stuff perhaps, but no
one cares about that).


Honestly, I think you and Christian are overthinking this. Let's try
charging the memory to every process which shares a buffer, and go from
there.


I'm not concerned about wrongly accounting shared buffers (they don't
matter), but imbalanced accounting. I.e. allocate a buffer in the client,
share it, but then the compositor drops the last reference.


I don't think the order matters. The memory is "uncharged" in each
process when it drops its reference.


Daniel made a fair point about passing DRM fds between processes, though.

It's not a problem with how the fds are currently used, but somebody 
could do the following:


1. Create a DRM fd in process A, allocate lots of buffers.
2. Pass the fd to process B via some IPC mechanism.
3. Exit process A.

There needs to be some assurance that the BOs are accounted as belonging 
to process B in the end.


Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH umr] Switch GRBM index when reading wave data directly.

2017-11-09 Thread Nicolai Hähnle

Reviewed-by: Nicolai Hähnle 

On 08.11.2017 19:39, Tom St Denis wrote:

Signed-off-by: Tom St Denis 
---
  src/lib/wave_status.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/src/lib/wave_status.c b/src/lib/wave_status.c
index fe2add779fdd..7f0130bb9347 100644
--- a/src/lib/wave_status.c
+++ b/src/lib/wave_status.c
@@ -116,7 +116,9 @@ static int umr_get_wave_status_vi(struct umr_asic *asic, 
unsigned se, unsigned s
read(asic->fd.wave, &buf, 32*4);
} else {
int n = 0;
+   umr_grbm_select_index(asic, se, sh, cu);
read_wave_status_via_mmio(asic, simd, wave, &buf[0], &n);
+   umr_grbm_select_index(asic, 0x, 0x, 0x);
}
  
  	if (buf[0] != 0) {

@@ -218,7 +220,9 @@ static int umr_get_wave_status_ai(struct umr_asic *asic, 
unsigned se, unsigned s
read(asic->fd.wave, &buf, 32*4);
} else {
int n = 0;
+   umr_grbm_select_index(asic, se, sh, cu);
read_wave_status_via_mmio(asic, simd, wave, &buf[0], &n);
+   umr_grbm_select_index(asic, 0x, 0x, 0x);
}
  
  	if (buf[0] != 1) {





--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 1/2] headers: Sync amdgpu_drm.h with drm-next

2017-10-23 Thread Nicolai Hähnle

Both patches:

Reviewed-by: Nicolai Hähnle 


On 20.10.2017 16:57, Andres Rodriguez wrote:

Generated using make headers_install from:
airlied/drm-next 282dc83 Merge tag 'drm-intel-next-2017-10-12' ...

Signed-off-by: Andres Rodriguez 
---
  include/drm/amdgpu_drm.h | 31 ++-
  1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
index 4c6e8c4..ff01818 100644
--- a/include/drm/amdgpu_drm.h
+++ b/include/drm/amdgpu_drm.h
@@ -53,6 +53,7 @@ extern "C" {
  #define DRM_AMDGPU_WAIT_FENCES0x12
  #define DRM_AMDGPU_VM 0x13
  #define DRM_AMDGPU_FENCE_TO_HANDLE0x14
+#define DRM_AMDGPU_SCHED   0x15
  
  #define DRM_IOCTL_AMDGPU_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)

  #define DRM_IOCTL_AMDGPU_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -69,6 +70,7 @@ extern "C" {
  #define DRM_IOCTL_AMDGPU_WAIT_FENCES  DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_WAIT_FENCES, union drm_amdgpu_wait_fences)
  #define DRM_IOCTL_AMDGPU_VM   DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_VM, union drm_amdgpu_vm)
  #define DRM_IOCTL_AMDGPU_FENCE_TO_HANDLE DRM_IOWR(DRM_COMMAND_BASE + 
DRM_AMDGPU_FENCE_TO_HANDLE, union drm_amdgpu_fence_to_handle)
+#define DRM_IOCTL_AMDGPU_SCHED DRM_IOW(DRM_COMMAND_BASE + 
DRM_AMDGPU_SCHED, union drm_amdgpu_sched)
  
  #define AMDGPU_GEM_DOMAIN_CPU		0x1

  #define AMDGPU_GEM_DOMAIN_GTT 0x2
@@ -91,6 +93,8 @@ extern "C" {
  #define AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS (1 << 5)
  /* Flag that BO is always valid in this VM */
  #define AMDGPU_GEM_CREATE_VM_ALWAYS_VALID (1 << 6)
+/* Flag that BO sharing will be explicitly synchronized */
+#define AMDGPU_GEM_CREATE_EXPLICIT_SYNC(1 << 7)
  
  struct drm_amdgpu_gem_create_in  {

/** the requested memory size */
@@ -166,13 +170,22 @@ union drm_amdgpu_bo_list {
  /* unknown cause */
  #define AMDGPU_CTX_UNKNOWN_RESET  3
  
+/* Context priority level */

+#define AMDGPU_CTX_PRIORITY_UNSET   -2048
+#define AMDGPU_CTX_PRIORITY_VERY_LOW-1023
+#define AMDGPU_CTX_PRIORITY_LOW -512
+#define AMDGPU_CTX_PRIORITY_NORMAL  0
+/* Selecting a priority above NORMAL requires CAP_SYS_NICE or DRM_MASTER */
+#define AMDGPU_CTX_PRIORITY_HIGH512
+#define AMDGPU_CTX_PRIORITY_VERY_HIGH   1023
+
  struct drm_amdgpu_ctx_in {
/** AMDGPU_CTX_OP_* */
__u32   op;
/** For future use, no flags defined so far */
__u32   flags;
__u32   ctx_id;
-   __u32   _pad;
+   __s32   priority;
  };
  
  union drm_amdgpu_ctx_out {

@@ -216,6 +229,21 @@ union drm_amdgpu_vm {
struct drm_amdgpu_vm_out out;
  };
  
+/* sched ioctl */

+#define AMDGPU_SCHED_OP_PROCESS_PRIORITY_OVERRIDE  1
+
+struct drm_amdgpu_sched_in {
+   /* AMDGPU_SCHED_OP_* */
+   __u32   op;
+   __u32   fd;
+   __s32   priority;
+   __u32   flags;
+};
+
+union drm_amdgpu_sched {
+   struct drm_amdgpu_sched_in in;
+};
+
  /*
   * This is not a reliable API and you should expect it to fail for any
   * number of reasons and have fallback path that do not use userptr to
@@ -629,6 +657,7 @@ struct drm_amdgpu_cs_chunk_data {
#define AMDGPU_INFO_SENSOR_VDDGFX   0x7
  /* Number of VRAM page faults on CPU access. */
  #define AMDGPU_INFO_NUM_VRAM_CPU_PAGE_FAULTS  0x1E
+#define AMDGPU_INFO_VRAM_LOST_COUNTER  0x1F
  
  #define AMDGPU_INFO_MMR_SE_INDEX_SHIFT	0

  #define AMDGPU_INFO_MMR_SE_INDEX_MASK 0xff




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amd/display: Trigger bug when allocation fails.

2017-10-17 Thread Nicolai Hähnle

On 17.10.2017 19:45, Tom St Denis wrote:

If the allocation fails in amdgpu_dm_connector_funcs_reset() the
API cannot continue so trigger a BUG_ON.


That seems questionable to be honest. The drm_atomic_helper version of 
this function ends up setting connector->state = NULL; in this case.


Cheers,
Nicolai



Signed-off-by: Tom St Denis 
---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 58e29a2a5ca6..ac58ba4f10cf 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2722,6 +2722,7 @@ void amdgpu_dm_connector_funcs_reset(struct drm_connector 
*connector)
kfree(state);
  
  	state = kzalloc(sizeof(*state), GFP_KERNEL);

+   BUG_ON(state == NULL);
  
  	if (state) {

state->scaling = RMX_OFF;




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: regression with d6c650c0a8f6f671e49553725e1db541376d95f2

2017-10-17 Thread Nicolai Hähnle

On 13.10.2017 10:46, Liu, Monk wrote:
Just revert Nicolai’s patch,if other routine want to reference s_fence, 
it should get the finished fence in the first place/time,


For gpu_reset routine, it refers to s_fence only on those unfinished job 
in sched_hw_job_reset, so totally safe to refer to s_fence pointer


I wonder what issue Nicolai met with and submitted this patch


The original motivation of my patch was to catch accidental use of 
job->s_fence after the fence was destroyed in amd_sched_process_job. 
Basically, prevent a dangling pointer. I still think that's a reasonable 
idea, though clearly my first attempt at it was just wrong.


In Christian's v2 patch, it might make sense to add

spin_unlock(&sched->job_list_lock);
+   dma_fence_put(&s_job->s_fence->finished);
+   s_job->s_fence = NULL;
sched->ops->free_job(s_job);

... though I'm not 100% certain about how the fence lifetimes work.

Cheers,
Nicolai




BR Monk

*From:*amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] *On Behalf 
Of *Liu, Monk

*Sent:* 2017年10月13日16:40
*To:* Koenig, Christian ; Nicolai Hähnle 
; amd-gfx@lists.freedesktop.org

*Subject:* RE: regression with d6c650c0a8f6f671e49553725e1db541376d95f2

I doubt it would always work fine…

First, we have FENCE_TRACE reference s_fence->finished after 
“fence_signal(&fence->finished)”


Second, we have trace_amd_sched_proess_job(s_fence) after 
“amd_sched_fence_finished()”,


If you put the finished before free_job() and by coincidence the 
job_finish() get very soon executed you’ll have odds to hit wild pointer 
on above two cases


BR Monk

*From:*Koenig, Christian
*Sent:* 2017年10月13日16:17
*To:* Liu, Monk mailto:monk@amd.com>>; Nicolai 
Hähnle mailto:nhaeh...@gmail.com>>; 
amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>

*Subject:* Re: regression with d6c650c0a8f6f671e49553725e1db541376d95f2

Yeah, that change is actually incorrect and should be reverted.

What we really need to do is remove dropping sched_job->s_fence from 
amd_sched_process_job() into amd_sched_job_finish() directly before the 
call to free_job().


Regards,
Christian.

Am 13.10.2017 um 09:24 schrieb Liu, Monk:

    commit d6c650c0a8f6f671e49553725e1db541376d95f2
Author: Nicolai Hähnle 
<mailto:nicolai.haeh...@amd.com>
@@ -611,6 +611,10 @@ static int amd_sched_main(void *param)

     fence = sched->ops->run_job(sched_job);
     amd_sched_fence_scheduled(s_fence);
+
+   /* amd_sched_process_job drops the job's reference
of the fence. */
+   sched_job->s_fence = NULL;
+
     if (fence) {
     s_fence->parent = dma_fence_get(fence);
     r = dma_fence_add_callback(fence, &s_fence->cb,

Hi Nicolai

with this patch, you will break "amdgpu_sched_hw_job_reset()"routine:

void

amd_sched_hw_job_reset(structamd_gpu_scheduler

*sched)

{

structamd_sched_job

*s_job;

spin_lock(&sched->job_list_lock);

list_for_each_entry_reverse(s_job,

&sched->ring_mirror_list, node) {

if(s_job->s_fence->parent

&&

fence_remove_callback(s_job->s_fence->parent,

  &s_job->s_fence->cb))

{

fence_put(s_job->s_fence->parent);

 s_job->s_fence->parent

=

NULL;

atomic_dec(&sched->hw_rq_count);

 }

 }

spin_unlock(&sched->job_list_lock);

}

see that without sched_job->s_fence, you cannot remove the callback
from its hw fence,

any idea??

BR Monk




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: TDR and VRAM lost handling in KMD (v2)

2017-10-13 Thread Nicolai Hähnle
 consistent upon a given context,


That sounds like a good idea to me, but I'm not sure if it won't
break userspace (I don't think so). Nicolai or Marek need to comment.


  Second, I want to say that for KERNEL, it shouldn't use the term
  from MESA or OGL or VULKAN, e.g. kernel shouldn't use
  AMDGPU_INNOCENT_RESET to map to GL_INNOCENT_RESET_ARB, etc...

  Because that way kernel will be very limited to certain UMD, so I
  suggest we totally re-name the context status, and each UMD has its
  own way to map the kernel context's result to
gl-context/vk-context/etc…


Yes, completely agree.


  Kernel should only provide below **FLAG bits** on a given context:

  ·Define AMDGPU_CTX_STATE_GUILTY 0x1 //as long as TDR
  detects a job hang, KMD set the context behind this context as "guilty"

  ·Define AMDGPU_CTX_STATE_VRAMLOST 0x2  //as long as
  there is a VRAM lost event hit after this context created, we mark
  this context "VRAM_LOST", so UMD can say that all BO under this
  context may lost their content,  since kernel have no relationship
  between context and BO so this is UMD's call to judge if a BO
  considered "VRAM lost" or not.

  ·Define AMDGPU_CTX_STATE_RESET   0x3 //as long as there is a gpu
  reset occurred after context creation, this flag shall be set


That sounds sane, but unfortunately might not be possible with the
existing IOCTL. Keep in mind that we need to keep backward
compatibility here.


  Sample code:

  Int amdgpu_ctx_query(struct amdgpu_ctx_query_parm * out, …..) {

   if (ctx- >vram_lost_counter != adev->vram_lost_counter)

   out- >status |= AMDGPU_CTX_STATE_VRAM_LOST;

  if (ctx- >reset_counter != adev→reset_counter) {

  out- >status |= AMDGPU_CTX_STATE_RESET;

  if (ctx- >guilty == TRUE)

   out- >status |=
AMDGPU_CTX_STATE_GUILTY;

  }

  return 0;

  }

  For UMD if it found "out.status == 0" means there is no gpu reset
  even occurred, the context is totally regular,

  ·*A new IOCTL added for context:*

  Void amdgpu_ctx_reinit(){

   Ctx→vram_lost_counter = adev→vram_lost_counter;

   Ctx→reset_counter = adev→reset_counter;

  }


Mhm, is there any advantage to just creating a new context?

Regards,
Christian.


  *if UMD decide *not* to release the "guilty" context but continue
  using it after UMD acknowledged GPU hang on certain job/context, I
  suggest UMD call "amdgpu_ctx_reinit()":*

  That way after you re-init() this context, you can get updated
  result from "amdgpu_ctx_query", which will probably give you
  "out.status == 0" as long as no gpu reset/vram lost hit after re-init().

  BR Monk

  -Original Message-
  From: Koenig, Christian
  Sent: 2017年10月12日18:13
  To: Haehnle, Nicolai 
  <mailto:nicolai.haeh...@amd.com>; Michel Dänzer 
  <mailto:mic...@daenzer.net>; Liu, Monk 
  <mailto:monk@amd.com>; Olsak, Marek 
  <mailto:marek.ol...@amd.com>; Deucher, Alexander
   <mailto:alexander.deuc...@amd.com>;
  Zhou, David(ChunMing) 
  <mailto:david1.z...@amd.com>; Mao, David 
  <mailto:david@amd.com>
  Cc: Ramirez, Alejandro 
  <mailto:alejandro.rami...@amd.com>; amd-gfx@lists.freedesktop.org
  <mailto:amd-gfx@lists.freedesktop.org>; Filipas, Mario
   <mailto:mario.fili...@amd.com>; Ding, Pixel
   <mailto:pixel.d...@amd.com>; Li, Bingley
   <mailto:bingley...@amd.com>; Jiang, Jerry (SW)
   <mailto:jerry.ji...@amd.com>
  Subject: Re: TDR and VRAM lost handling in KMD (v2)

  Am 12.10.2017 um 11:44 schrieb Nicolai Hähnle:

  > On 12.10.2017 11:35, Michel Dänzer wrote:

  >> On 12/10/17 11:23 AM, Christian König wrote:

  >>> Am 12.10.2017 um 11:10 schrieb Nicolai Hähnle:

  >>>> On 12.10.2017 10:49, Christian König wrote:

  >>>>>> However, !guilty && ctx->reset_counter !=
adev->reset_counter

  >>>>>> does not imply that the context was lost.

  >>>>>>

  >>>>>> The way I understand it, we should return

  >>>>>> AMDGPU_CTX_INNOCENT_RESET if !guilty && ctx->vram_lost_counter != 
adev->vram_lost_counter.

  >>>>>>

  >>>>>> As far as I understand it, the case of !guilty &&

  >>>>>> ctx->reset_counter != adev->reset_counter &&

  >>>>>> ctx->vram_lost_counter ==

  >>>>>&g

Re: TDR and VRAM lost handling in KMD (v2)

2017-10-13 Thread Nicolai Hähnle
ally regular,

·*A new IOCTL added for context:*

Void amdgpu_ctx_reinit(){

     Ctx→vram_lost_counter = adev→vram_lost_counter;

     Ctx→reset_counter = adev→reset_counter;

}


Mhm, is there any advantage to just creating a new context?

Regards,
Christian.


*if UMD decide *not* to release the "guilty" context but continue
using it after UMD acknowledged GPU hang on certain job/context, I
suggest UMD call "amdgpu_ctx_reinit()":*

That way after you re-init() this context, you can get updated
result from "amdgpu_ctx_query", which will probably give you
"out.status == 0" as long as no gpu reset/vram lost hit after re-init().

BR Monk

-Original Message-
From: Koenig, Christian
Sent: 2017年10月12日18:13
To: Haehnle, Nicolai 
<mailto:nicolai.haeh...@amd.com>; Michel Dänzer 
<mailto:mic...@daenzer.net>; Liu, Monk 
<mailto:monk@amd.com>; Olsak, Marek 
<mailto:marek.ol...@amd.com>; Deucher, Alexander
 <mailto:alexander.deuc...@amd.com>;
Zhou, David(ChunMing) 
<mailto:david1.z...@amd.com>; Mao, David 
<mailto:david@amd.com>
Cc: Ramirez, Alejandro 
<mailto:alejandro.rami...@amd.com>; amd-gfx@lists.freedesktop.org
<mailto:amd-gfx@lists.freedesktop.org>; Filipas, Mario
 <mailto:mario.fili...@amd.com>; Ding, Pixel
 <mailto:pixel.d...@amd.com>; Li, Bingley
 <mailto:bingley...@amd.com>; Jiang, Jerry (SW)
 <mailto:jerry.ji...@amd.com>
Subject: Re: TDR and VRAM lost handling in KMD (v2)

Am 12.10.2017 um 11:44 schrieb Nicolai Hähnle:

> On 12.10.2017 11:35, Michel Dänzer wrote:

>> On 12/10/17 11:23 AM, Christian König wrote:

>>> Am 12.10.2017 um 11:10 schrieb Nicolai Hähnle:

>>>> On 12.10.2017 10:49, Christian König wrote:

>>>>>> However, !guilty && ctx->reset_counter != adev->reset_counter

>>>>>> does not imply that the context was lost.

>>>>>>

>>>>>> The way I understand it, we should return

>>>>>> AMDGPU_CTX_INNOCENT_RESET if !guilty && ctx->vram_lost_counter != 
adev->vram_lost_counter.

>>>>>>

>>>>>> As far as I understand it, the case of !guilty &&

>>>>>> ctx->reset_counter != adev->reset_counter &&

>>>>>> ctx->vram_lost_counter ==

>>>>>> adev->vram_lost_counter should return AMDGPU_CTX_NO_RESET,

>>>>>> adev->because a

>>>>>> GPU reset occurred, but it didn't affect our context.

>>>>> I disagree on that.

>>>>>

>>>>> AMDGPU_CTX_INNOCENT_RESET just means what it does currently, there

>>>>> was a reset but we haven't been causing it.

>>>>>

>>>>> That the OpenGL extension is specified otherwise is unfortunate,

>>>>> but I think we shouldn't use that for the kernel interface here.

>>>> Two counterpoints:

>>>>

>>>> 1. Why should any application care that there was a reset while it

>>>> was idle? The OpenGL behavior is what makes sense.

>>>

>>> The application is certainly not interest if a reset happened or

>>> not, but I though that the driver stack might be.

>>>

>>>>

>>>> 2. AMDGPU_CTX_INNOCENT_RESET doesn't actually mean anything today

>>>> because we never return it :)

>>>>

>>>

>>> Good point.

>>>

>>>> amdgpu_ctx_query only ever returns AMDGPU_CTX_UNKNOWN_RESET, which

>>>> is in line with the OpenGL spec: we're conservatively returning

>>>> that a reset happened because we don't know whether the context was

>>>> affected, and we return UNKNOWN because we also don't know whether

>>>> the context was guilty or not.

>>>>

>>>> Returning AMDGPU_CTX_NO_RESET in the case of !guilty &&

>>>> ctx->vram_lost_counter == adev->vram_lost_counter is simply a

>>>> refinement and improvement of the current, overly conservative

>>>> behavior.

>>>

>>> Ok let's reenumerate what I think the different return values should

>>> mean:

>>>

>>> * AMDGPU_CTX_GUILTY_RESET

>>>

>>> guilty is set to true for this context.

>>

Re: TDR and VRAM lost handling in KMD (v2)

2017-10-12 Thread Nicolai Hähnle

On 12.10.2017 13:49, Christian König wrote:

Am 12.10.2017 um 13:37 schrieb Liu, Monk:

Hi team
Very good, many policy and implement are agreed, looks we only have 
some arguments in amdgpu_ctx_query(), well I also confused with the 
current implement of it, ☹
First, I want to know if you guys agree that we*don't update 
ctx->reset_counter in amdgpu_ctx_query() *? because I want to make the 
query result always consistent upon a given context,


That sounds like a good idea to me, but I'm not sure if it won't break 
userspace (I don't think so). Nicolai or Marek need to comment.


That should be fine for Mesa.


Second, I want to say that for KERNEL, it shouldn't use the term from 
MESA or OGL or VULKAN, e.g. kernel shouldn't use AMDGPU_INNOCENT_RESET 
to map to GL_INNOCENT_RESET_ARB, etc...
Because that way kernel will be very limited to certain UMD, so I 
suggest we totally re-name the context status, and each UMD has its 
own way to map the kernel context's result to gl-context/vk-context/etc…


Yes, completely agree.


Kernel should only provide below **FLAG bits** on a given context:

  * Define AMDGPU_CTX_STATE_GUILTY 0x1 //as long as TDR
detects a job hang, KMD set the context behind this context as
"guilty"
  * Define AMDGPU_CTX_STATE_VRAMLOST 0x2  //as long as
there is a VRAM lost event hit after this context created, we mark
this context "VRAM_LOST", so UMD can say that all BO under this
context may lost their content, since kernel have no relationship
between context and BO so this is UMD's call to judge if a BO
considered "VRAM lost" or not.
  * Define AMDGPU_CTX_STATE_RESET   0x3 //as long as there is a
gpu reset occurred after context creation, this flag shall be set



That sounds sane, but unfortunately might not be possible with the 
existing IOCTL. Keep in mind that we need to keep backward compatibility 
here.


Agreed. FWIW, when Mesa sees an unknown (new) value being returned from 
amdgpu_ctx_query, it treats it the same as AMDGPU_CTX_NO_RESET.


Cheers,
Nicolai



Sample code:
Int amdgpu_ctx_query(struct amdgpu_ctx_query_parm * out, …..) {
    if (ctx- >vram_lost_counter != adev->vram_lost_counter)
    out- >status |= AMDGPU_CTX_STATE_VRAM_LOST;
if (ctx- >reset_counter != adev→reset_counter){
out- >status |= AMDGPU_CTX_STATE_RESET;
if (ctx- >guilty == TRUE)
    out- >status |= AMDGPU_CTX_STATE_GUILTY;
}
return 0;
}
For UMD if it found "out.status == 0" means there is no gpu reset even 
occurred, the context is totally regular,


  * *Anew IOCTL added for context:*

Void amdgpu_ctx_reinit(){
    Ctx→vram_lost_counter = adev→vram_lost_counter;
    Ctx→reset_counter = adev→reset_counter;
}


Mhm, is there any advantage to just creating a new context?

Regards,
Christian.

*if**UMD decide *not* to release the "guilty" context but continue 
using it **after**UMD acknowledged GPU hang **on certain job/context, 
I suggest **UMD call "amdgpu_ctx_reinit()":*
That way after you re-init() this context, you can get updated result 
from "amdgpu_ctx_query", which will probably give you "out.status == 
0" as long as no gpu reset/vram lost hit after re-init().

BR Monk
-Original Message-
From: Koenig, Christian
Sent: 2017年10月12日 18:13
To: Haehnle, Nicolai ; Michel Dänzer 
; Liu, Monk ; Olsak, Marek 
; Deucher, Alexander ; 
Zhou, David(ChunMing) ; Mao, David 

Cc: Ramirez, Alejandro ; 
amd-gfx@lists.freedesktop.org; Filipas, Mario ; 
Ding, Pixel ; Li, Bingley ; 
Jiang, Jerry (SW) 

Subject: Re: TDR and VRAM lost handling in KMD (v2)
Am 12.10.2017 um 11:44 schrieb Nicolai Hähnle:
> On 12.10.2017 11:35, Michel Dänzer wrote:
>> On 12/10/17 11:23 AM, Christian König wrote:
>>> Am 12.10.2017 um 11:10 schrieb Nicolai Hähnle:
>>>> On 12.10.2017 10:49, Christian König wrote:
>>>>>> However, !guilty && ctx->reset_counter != adev->reset_counter
>>>>>> does not imply that the context was lost.
>>>>>>
>>>>>> The way I understand it, we should return
>>>>>> AMDGPU_CTX_INNOCENT_RESET if !guilty && ctx->vram_lost_counter != 
adev->vram_lost_counter.
>>>>>>
>>>>>> As far as I understand it, the case of !guilty &&
>>>>>> ctx->reset_counter != adev->reset_counter &&
>>>>>> ctx->vram_lost_counter ==
>>>>>> adev->vram_lost_counter should return AMDGPU_CTX_NO_RESET,
>>>>>> adev->because a
>>>>>> GPU reset occurred, but it didn't affect our context.
>>>>> I disagree on that.
>>>>>
>>>>> AMDGPU_CTX_INNOCENT_RESET 

Re: TDR and VRAM lost handling in KMD (v2)

2017-10-12 Thread Nicolai Hähnle

On 12.10.2017 11:35, Michel Dänzer wrote:

On 12/10/17 11:23 AM, Christian König wrote:

Am 12.10.2017 um 11:10 schrieb Nicolai Hähnle:

On 12.10.2017 10:49, Christian König wrote:

However, !guilty && ctx->reset_counter != adev->reset_counter does
not imply that the context was lost.

The way I understand it, we should return AMDGPU_CTX_INNOCENT_RESET
if !guilty && ctx->vram_lost_counter != adev->vram_lost_counter.

As far as I understand it, the case of !guilty && ctx->reset_counter
!= adev->reset_counter && ctx->vram_lost_counter ==
adev->vram_lost_counter should return AMDGPU_CTX_NO_RESET, because a
GPU reset occurred, but it didn't affect our context.

I disagree on that.

AMDGPU_CTX_INNOCENT_RESET just means what it does currently, there
was a reset but we haven't been causing it.

That the OpenGL extension is specified otherwise is unfortunate, but
I think we shouldn't use that for the kernel interface here.

Two counterpoints:

1. Why should any application care that there was a reset while it was
idle? The OpenGL behavior is what makes sense.


The application is certainly not interest if a reset happened or not,
but I though that the driver stack might be.



2. AMDGPU_CTX_INNOCENT_RESET doesn't actually mean anything today
because we never return it :)



Good point.


amdgpu_ctx_query only ever returns AMDGPU_CTX_UNKNOWN_RESET, which is
in line with the OpenGL spec: we're conservatively returning that a
reset happened because we don't know whether the context was affected,
and we return UNKNOWN because we also don't know whether the context
was guilty or not.

Returning AMDGPU_CTX_NO_RESET in the case of !guilty &&
ctx->vram_lost_counter == adev->vram_lost_counter is simply a
refinement and improvement of the current, overly conservative behavior.


Ok let's reenumerate what I think the different return values should mean:

* AMDGPU_CTX_GUILTY_RESET

guilty is set to true for this context.

* AMDGPU_CTX_INNOCENT_RESET

guilty is not set and vram_lost_counter has changed.

* AMDGPU_CTX_UNKNOWN_RESET

guilty is not set and vram_lost_counter has not changed, but
gpu_reset_counter has changed.


I don't understand the distinction you're proposing between
AMDGPU_CTX_INNOCENT_RESET and AMDGPU_CTX_UNKNOWN_RESET. I think both
cases you're describing should return either AMDGPU_CTX_INNOCENT_RESET,
if the value of guilty is reliable, or AMDGPU_CTX_UNKNOWN_RESET if it's not.


I think it'd make more sense if it was called "AMDGPU_CTX_UNAFFECTED_RESET".

So:
- AMDGPU_CTX_GUILTY_RESET --> the context was affected by a reset, and 
we know that it's the context's fault
- AMDGPU_CTX_INNOCENT_RESET --> the context was affected by a reset, and 
we know that it *wasn't* the context's fault (no context job active)
- AMDGPU_CTX_UNKNOWN_RESET --> the context was affected by a reset, and 
we don't know who's responsible (this could be returned in the unlikely 
case where context A's gfx job has not yet finished, but context B's gfx 
job has already started; it could be the fault of A, it could be the 
fault of B -- which somehow manages to hang a part of the hardware that 
then prevents A's job from finishing -- or it could be both; but it's a 
bit academic)
- AMDGPU_CTX_UNAFFECTED_RESET --> there was a reset, but this context 
wasn't affected


This last value would currently just be discarded by Mesa (because we 
should only disturb applications when we have to), but perhaps somebody 
else could find it useful?


Cheers,
Nicolai
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: TDR and VRAM lost handling in KMD (v2)

2017-10-12 Thread Nicolai Hähnle

On 12.10.2017 10:49, Christian König wrote:
However, !guilty && ctx->reset_counter != adev->reset_counter does not 
imply that the context was lost.


The way I understand it, we should return AMDGPU_CTX_INNOCENT_RESET if 
!guilty && ctx->vram_lost_counter != adev->vram_lost_counter.


As far as I understand it, the case of !guilty && ctx->reset_counter 
!= adev->reset_counter && ctx->vram_lost_counter == 
adev->vram_lost_counter should return AMDGPU_CTX_NO_RESET, because a 
GPU reset occurred, but it didn't affect our context.

I disagree on that.

AMDGPU_CTX_INNOCENT_RESET just means what it does currently, there was a 
reset but we haven't been causing it.


That the OpenGL extension is specified otherwise is unfortunate, but I 
think we shouldn't use that for the kernel interface here.

Two counterpoints:

1. Why should any application care that there was a reset while it was 
idle? The OpenGL behavior is what makes sense.


2. AMDGPU_CTX_INNOCENT_RESET doesn't actually mean anything today 
because we never return it :)


amdgpu_ctx_query only ever returns AMDGPU_CTX_UNKNOWN_RESET, which is in 
line with the OpenGL spec: we're conservatively returning that a reset 
happened because we don't know whether the context was affected, and we 
return UNKNOWN because we also don't know whether the context was guilty 
or not.


Returning AMDGPU_CTX_NO_RESET in the case of !guilty && 
ctx->vram_lost_counter == adev->vram_lost_counter is simply a refinement 
and improvement of the current, overly conservative behavior.


Cheers,
Nicolai




Regards,
Christian.

Am 12.10.2017 um 10:44 schrieb Nicolai Hähnle:
I think we should stick to the plan where kernel contexts stay "stuck" 
after a GPU reset. This is the most robust behavior for the kernel.


Even if the OpenGL spec says that an OpenGL context can be re-used 
without destroying and re-creating it, the UMD can take care of 
re-creating the kernel context.


This means amdgpu_ctx_query should *not* reset ctx->reset_counter.

Cheers,
Nicolai


On 12.10.2017 10:41, Nicolai Hähnle wrote:

Hi Monk,

Thanks for the summary. Most of it looks good to me, though I can't 
speak to all the kernel internals.


Just some comments:

On 12.10.2017 10:03, Liu, Monk wrote:

lFor cs_submit() IOCTL:

1.check if current ctx been marked “*guilty*”and return 
“*ECANCELED*”   if so.


2.set job->*vram_lost_counter* with adev->*vram_lost_counter*, and 
return “*ECANCELED*” if ctx->*vram_lost_counter* != 
job->*vram_lost_counter* (Christian already submitted this patch)


a)discussion: can we return “ENODEV” if vram_lost_counter mismatch ? 
that way UMD know this context is under “device lost”


My plan for UMD is to always query the VRAM lost counter when any 
kind of context lost situation is detected. So cs_submit() should 
return an error in this situation, but it could just be ECANCELED. We 
don't need to distinguish between different types of errors here.



lIntroduce a new IOCTL to let UMD query latest 
adev->*vram_lost_counter*:


Christian already sent a patch for this.



lFor amdgpu_ctx_query():

n*Don’t update ctx->reset_counter when querying this function, 
otherwise the query result is not consistent *


Hmm. I misremembered part of the spec, see below.


nSet out->state.reset_status to “AMDGPU_CTX_GUILTY_RESET” if the ctx 
is “*guilty*”, no need to check “ctx->reset_counter”


Agreed.


nSet out->state.reset_status to “AMDGPU_CTX_INNOCENT_RESET” *if the 
ctx isn’t “guilty” && ctx->reset_counter != adev->reset_counter *


I disagree. The meaning of AMDGPU_CTX_*_RESET should reflect the 
corresponding enums in user space APIs. I don't know how it works in 
Vulkan, but in OpenGL, returning GL_INNOCENT_CONTEXT_RESET_ARB means 
that the context was lost.


However, !guilty && ctx->reset_counter != adev->reset_counter does 
not imply that the context was lost.


The way I understand it, we should return AMDGPU_CTX_INNOCENT_RESET 
if !guilty && ctx->vram_lost_counter != adev->vram_lost_counter.


As far as I understand it, the case of !guilty && ctx->reset_counter 
!= adev->reset_counter && ctx->vram_lost_counter == 
adev->vram_lost_counter should return AMDGPU_CTX_NO_RESET, because a 
GPU reset occurred, but it didn't affect our context.


I unfortunately noticed another subtlety while re-reading the OpenGL 
spec. OpenGL says that the OpenGL context itself does *not* have to 
be re-created in order to recover from the reset. Re-creating all 
objects in the context is sufficient.


I believe this is the original motivation for why amdgpu_ctx_query() 
will reset the ctx->reset_counter.


For Mesa, it's still okay if the kernel keeps blocking submissions as 
we can just recreate the kernel context. But OrcaGL is also affected.


Does 

Re: TDR and VRAM lost handling in KMD (v2)

2017-10-12 Thread Nicolai Hähnle

Hi Monk,

Thanks for the summary. Most of it looks good to me, though I can't 
speak to all the kernel internals.


Just some comments:

On 12.10.2017 10:03, Liu, Monk wrote:

lFor cs_submit() IOCTL:

1.check if current ctx been marked “*guilty*”and return “*ECANCELED*” 
  if so.


2.set job->*vram_lost_counter* with adev->*vram_lost_counter*, and 
return “*ECANCELED*” if ctx->*vram_lost_counter* != 
job->*vram_lost_counter* (Christian already submitted this patch)


a)discussion: can we return “ENODEV” if vram_lost_counter mismatch ? 
that way UMD know this context is under “device lost”


My plan for UMD is to always query the VRAM lost counter when any kind 
of context lost situation is detected. So cs_submit() should return an 
error in this situation, but it could just be ECANCELED. We don't need 
to distinguish between different types of errors here.




lIntroduce a new IOCTL to let UMD query latest adev->*vram_lost_counter*:


Christian already sent a patch for this.



lFor amdgpu_ctx_query():

n*Don’t update ctx->reset_counter when querying this function, otherwise 
the query result is not consistent *


Hmm. I misremembered part of the spec, see below.


nSet out->state.reset_status to “AMDGPU_CTX_GUILTY_RESET” if the ctx is 
“*guilty*”, no need to check “ctx->reset_counter”


Agreed.


nSet out->state.reset_status to “AMDGPU_CTX_INNOCENT_RESET” *if the ctx 
isn’t “guilty” && ctx->reset_counter != adev->reset_counter *


I disagree. The meaning of AMDGPU_CTX_*_RESET should reflect the 
corresponding enums in user space APIs. I don't know how it works in 
Vulkan, but in OpenGL, returning GL_INNOCENT_CONTEXT_RESET_ARB means 
that the context was lost.


However, !guilty && ctx->reset_counter != adev->reset_counter does not 
imply that the context was lost.


The way I understand it, we should return AMDGPU_CTX_INNOCENT_RESET if 
!guilty && ctx->vram_lost_counter != adev->vram_lost_counter.


As far as I understand it, the case of !guilty && ctx->reset_counter != 
adev->reset_counter && ctx->vram_lost_counter == adev->vram_lost_counter 
should return AMDGPU_CTX_NO_RESET, because a GPU reset occurred, but it 
didn't affect our context.


I unfortunately noticed another subtlety while re-reading the OpenGL 
spec. OpenGL says that the OpenGL context itself does *not* have to be 
re-created in order to recover from the reset. Re-creating all objects 
in the context is sufficient.


I believe this is the original motivation for why amdgpu_ctx_query() 
will reset the ctx->reset_counter.


For Mesa, it's still okay if the kernel keeps blocking submissions as we 
can just recreate the kernel context. But OrcaGL is also affected.


Does anybody know off-hand where the relevant parts of the Vulkan spec 
are? I didn't actually find anything in a quick search.



[snip]

For UMD behavior we still have something need to consider:

If MESA creates a new context from an old context (share list?? I’m not 
familiar with UMD , David Mao shall have some discuss on it with 
Nicolai), the new created context’s vram_lost_counter


And reset_counter shall all be ported from that old context , otherwise 
CS_SUBMIT will not block it which isn’t correct


The kernel doesn't have to do anything for this, it is entirely the 
UMD's responsibility. All UMD needs from KMD is the function for 
querying the vram_lost_counter.


Cheers,
Nicolai




Need your feedback, thx

*From:*amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] *On Behalf 
Of *Liu, Monk

*Sent:* 2017年10月11日13:34
*To:* Koenig, Christian ; Haehnle, Nicolai 
; Olsak, Marek ; Deucher, 
Alexander 
*Cc:* Ramirez, Alejandro ; 
amd-gfx@lists.freedesktop.org; Filipas, Mario ; 
Ding, Pixel ; Li, Bingley ; 
Jiang, Jerry (SW) 

*Subject:* TDR and VRAM lost handling in KMD:

Hi Christian & Nicolai,

We need to achieve some agreements on what should MESA/UMD do and what 
should KMD do, *please give your comments with **“okay”or “No”and your 
idea on below items,*


lWhen a job timed out (set from lockup_timeout kernel parameter), What 
KMD should do in TDR routine :


1.Update adev->*gpu_reset_counter*, and stop scheduler first, 
(*gpu_reset_counter* is used to force vm flush after GPU reset, out of 
this thread’s scope so no more discussion on it)


2.Set its fence error status to “*ETIME*”,

3.Find the entity/ctx behind this job, and set this ctx as “*guilty*”

4.Kick out this job from scheduler’s mirror list, so this job won’t get 
re-scheduled to ring anymore.


5.Kick out all jobs in this “guilty”ctx’s KFIFO queue, and set all their 
fence status to “*ECANCELED*”


*6.*Force signal all fences that get kicked out by above two 
steps,*otherwise UMD will block forever if waiting on those fences*


7.Do gpu reset, which is can be some callbacks to let bare-metal and 
SR-IOV implement with their favor style


8.After reset, KMD need to aware if the VRAM lost happens or not, 
bare-metal can implement some function to judge, while for SR-IOV I 
prefer to read it from GIM si

Re: TDR and VRAM lost handling in KMD (v2)

2017-10-12 Thread Nicolai Hähnle

Hi David,

Agreed. I'd also like to use the first variant (UMD compares the current 
vram_lost count with the share list's vram_lost count) for Mesa.


Cheers,
Nicolai

On 12.10.2017 10:43, Mao, David wrote:

Thanks Monk for the summary!

Hi Nicolai,
In order to block the usage of new context reference the old allocation, 
i think we need to do something in UMD so that KMD don’t need to monitor 
the resource list.

I want to make sure we are on the same page.
If you agree, then there might have two options to do that in UMD: (You 
can do whatever you want, i just want to elaborate the idea a little bit 
to facilitate the discussion).
-  If sharelist is valid, driver need to compare the current 
vram_lost_count and share list’s vram_lost count, The context will fail 
to create if share list created before the reset.
- Or, we can copy the vram_lost count from the sharelist, kernel will 
fail the submission if the vram_lost count is smaller than current one.

I personally want to go first for OrcaGL.

Thanks.
Best Regards,
David
-
On 12 Oct 2017, at 4:03 PM, Liu, Monk > wrote:


V2 summary
Hi team
*please give your comments*
lWhen a job timed out (set from lockup_timeout kernel parameter), What 
KMD should do in TDR routine :

1.Update adev->*gpu_reset_counter*, and stop scheduler first
2.Set its fence error status to“*ECANCELED*”,
3.Find the*context*behind this job, and set 
this*context*as“*guilty*”(will have a new member field in context 
structure –*bool guilty*)
a)There will be “*bool * guilty*” in entity structure, which points to 
its father context’s member – “*bool guilty”*when context 
initialized**, so no matter we get context or entity, we always know 
if it is “guilty”
b)For kernel entity that used for VM updates, there is no context back 
it, so kernel entity’s “bool *guilty” always “NULL”.
c)The idea to skip the whole context is for consistence consideration, 
because we’ll fake signal the hang job in job_run(), so all jobs in 
its context shall be dropped otherwise either bad drawing/computing 
results or more GPU hang.

**
4.Do GPU reset, which is can be some callbacks to let bare-metal and 
SR-IOV implement with their favor style
5.After reset, KMD need to aware if the VRAM lost happens or not, 
bare-metal can implement some function to judge, while for SR-IOV I 
prefer to read it from GIM side (for initial version we consider it’s 
always VRAM lost, till GIM side change aligned)

6.If VRAM lost hit, update adev->*vram_lost_counter*.
7.Do GTT recovery and shadow buffer recovery.
8.Re-schedule all JOBs in mirror list and restart scheduler
lFor GPU scheduler function --- job_run()
1.Before schedule a job to ring, checks if job->*vram_lost_counter*== 
adev->*vram_lost_counter*, and drop this job if mismatch
2.Before schedule a job to ring, checks if job->entity->*guilty*is 
NULL or not,*and drop this job if (guilty!=NULL && *guilty == TRUE)*

3.if a job is dropped:
a)set job’s sched_fence status to “*ECANCELED*”
b)fake/force signal job’s hw fence (no need to set hw fence’s status)
lFor cs_wait() IOCTL:
After it found fence signaled, it should check if there is error on 
this fence and return the error status of this fence

lFor cs_wait_fences() IOCTL:
Similar with above approach
lFor cs_submit() IOCTL:
1.check if current ctx been marked“*guilty*”and return“*ECANCELED*” if so.
2.set job->*vram_lost_counter*with adev->*vram_lost_counter*, and 
return “*ECANCELED*” if ctx->*vram_lost_counter*!= 
job->*vram_lost_counter*(Christian already submitted this patch)
a)discussion: can we return “ENODEV” if vram_lost_counter mismatch ? 
that way UMD know this context is under “device lost”

lIntroduce a new IOCTL to let UMD query latest adev->*vram_lost_counter*:
lFor amdgpu_ctx_query():
n*Don’t update ctx->reset_counter when querying this function, 
otherwise the query result is not consistent*
nSet out->state.reset_status to “AMDGPU_CTX_GUILTY_RESET” if the ctx 
is “*guilty*”, no need to check “ctx->reset_counter”
nSet out->state.reset_status to “AMDGPU_CTX_INNOCENT_RESET”*if the ctx 
isn’t “guilty” && ctx->reset_counter != adev->reset_counter*
nSet out->state.reset_status to “AMDGPU_CTX_NO_RESET” if 
ctx->reset_counter == adev->reset_counter
nSet out->state.flags to “AMDGPU_CTX_FLAG_VRAM_LOST” if 
ctx->vram_lost_counter != adev->vram_lost_counter
udiscussion: can we return “ENODEV” for amdgpu_ctx_query() if 
ctx->vram_lost_counter != adev->vram_lost_counter ? that way UMD know 
this context is under “device lost”
nUMD shall release this context if it is AMDGPU_CTX_GUILTY_RESET or 
its flags is “AMDGPU_CTX_FLAG_VRAM_LOST”

For UMD behavior we still have something need to consider:
If MESA creates a new context from an old context (share list?? I’m 
not familiar with UMD , David Mao shall have some discuss on it with 
Nicolai), the new created context’s vram_lost_counter
And reset_counter shall all be ported from that old context , 
otherwise CS_SUBMIT will not block it which isn

Re: TDR and VRAM lost handling in KMD (v2)

2017-10-12 Thread Nicolai Hähnle
I think we should stick to the plan where kernel contexts stay "stuck" 
after a GPU reset. This is the most robust behavior for the kernel.


Even if the OpenGL spec says that an OpenGL context can be re-used 
without destroying and re-creating it, the UMD can take care of 
re-creating the kernel context.


This means amdgpu_ctx_query should *not* reset ctx->reset_counter.

Cheers,
Nicolai


On 12.10.2017 10:41, Nicolai Hähnle wrote:

Hi Monk,

Thanks for the summary. Most of it looks good to me, though I can't 
speak to all the kernel internals.


Just some comments:

On 12.10.2017 10:03, Liu, Monk wrote:

lFor cs_submit() IOCTL:

1.check if current ctx been marked “*guilty*”and return “*ECANCELED*” 
  if so.


2.set job->*vram_lost_counter* with adev->*vram_lost_counter*, and 
return “*ECANCELED*” if ctx->*vram_lost_counter* != 
job->*vram_lost_counter* (Christian already submitted this patch)


a)discussion: can we return “ENODEV” if vram_lost_counter mismatch ? 
that way UMD know this context is under “device lost”


My plan for UMD is to always query the VRAM lost counter when any kind 
of context lost situation is detected. So cs_submit() should return an 
error in this situation, but it could just be ECANCELED. We don't need 
to distinguish between different types of errors here.




lIntroduce a new IOCTL to let UMD query latest adev->*vram_lost_counter*:


Christian already sent a patch for this.



lFor amdgpu_ctx_query():

n*Don’t update ctx->reset_counter when querying this function, 
otherwise the query result is not consistent *


Hmm. I misremembered part of the spec, see below.


nSet out->state.reset_status to “AMDGPU_CTX_GUILTY_RESET” if the ctx 
is “*guilty*”, no need to check “ctx->reset_counter”


Agreed.


nSet out->state.reset_status to “AMDGPU_CTX_INNOCENT_RESET” *if the 
ctx isn’t “guilty” && ctx->reset_counter != adev->reset_counter *


I disagree. The meaning of AMDGPU_CTX_*_RESET should reflect the 
corresponding enums in user space APIs. I don't know how it works in 
Vulkan, but in OpenGL, returning GL_INNOCENT_CONTEXT_RESET_ARB means 
that the context was lost.


However, !guilty && ctx->reset_counter != adev->reset_counter does not 
imply that the context was lost.


The way I understand it, we should return AMDGPU_CTX_INNOCENT_RESET if 
!guilty && ctx->vram_lost_counter != adev->vram_lost_counter.


As far as I understand it, the case of !guilty && ctx->reset_counter != 
adev->reset_counter && ctx->vram_lost_counter == adev->vram_lost_counter 
should return AMDGPU_CTX_NO_RESET, because a GPU reset occurred, but it 
didn't affect our context.


I unfortunately noticed another subtlety while re-reading the OpenGL 
spec. OpenGL says that the OpenGL context itself does *not* have to be 
re-created in order to recover from the reset. Re-creating all objects 
in the context is sufficient.


I believe this is the original motivation for why amdgpu_ctx_query() 
will reset the ctx->reset_counter.


For Mesa, it's still okay if the kernel keeps blocking submissions as we 
can just recreate the kernel context. But OrcaGL is also affected.


Does anybody know off-hand where the relevant parts of the Vulkan spec 
are? I didn't actually find anything in a quick search.



[snip]

For UMD behavior we still have something need to consider:

If MESA creates a new context from an old context (share list?? I’m 
not familiar with UMD , David Mao shall have some discuss on it with 
Nicolai), the new created context’s vram_lost_counter


And reset_counter shall all be ported from that old context , 
otherwise CS_SUBMIT will not block it which isn’t correct


The kernel doesn't have to do anything for this, it is entirely the 
UMD's responsibility. All UMD needs from KMD is the function for 
querying the vram_lost_counter.


Cheers,
Nicolai




Need your feedback, thx

*From:*amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] *On 
Behalf Of *Liu, Monk

*Sent:* 2017年10月11日13:34
*To:* Koenig, Christian ; Haehnle, Nicolai 
; Olsak, Marek ; 
Deucher, Alexander 
*Cc:* Ramirez, Alejandro ; 
amd-gfx@lists.freedesktop.org; Filipas, Mario ; 
Ding, Pixel ; Li, Bingley ; 
Jiang, Jerry (SW) 

*Subject:* TDR and VRAM lost handling in KMD:

Hi Christian & Nicolai,

We need to achieve some agreements on what should MESA/UMD do and what 
should KMD do, *please give your comments with **“okay”or “No”and your 
idea on below items,*


lWhen a job timed out (set from lockup_timeout kernel parameter), What 
KMD should do in TDR routine :


1.Update adev->*gpu_reset_counter*, and stop scheduler first, 
(*gpu_reset_counter* is used to force vm flush after GPU reset, out of 
this thread’s scope so no more discussion on it)


2.Set its fence error status to “*ETIME*”,

3.Find the entity/ctx behind this job, and set this ctx as “*guilty*”

4.Kick out this jo

Re: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Nicolai Hähnle

On 11.10.2017 11:18, Liu, Monk wrote:

Let's talk it simple, When vram lost hit,  what's the action for 
amdgpu_ctx_query()/AMDGPU_CTX_OP_QUERY_STATE on other contexts (not the one 
trigger gpu hang) after vram lost ? do you mean we return -ENODEV to UMD ?


It should successfully return AMDGPU_CTX_INNOCENT_RESET.



In cs_submit, with vram lost hit, if we don't mark all contexts as "guilty", 
how we block its from submitting ? can you show some implement way


if (ctx->vram_lost_counter != atomic_read(&adev->vram_lost_counter))
return -ECANCELED;

(where ctx->vram_lost_counter is initialized at context creation time 
and never changed afterwards)




BTW: the "guilty" here is a new member I want to add to context, it is not 
related with AMDGPU_CTX_OP_QUERY_STATE UK interface,
Looks I need to unify them and only one place to mark guilty or not


Right, the AMDGPU_CTX_OP_QUERY_STATE handling needs to be made 
consistent with the rest.


Cheers,
Nicolai





BR Monk

-Original Message-
From: Haehnle, Nicolai
Sent: Wednesday, October 11, 2017 5:00 PM
To: Liu, Monk ; Koenig, Christian ; Olsak, 
Marek ; Deucher, Alexander 
Cc: amd-gfx@lists.freedesktop.org; Ding, Pixel ; Jiang, Jerry (SW) 
; Li, Bingley ; Ramirez, Alejandro 
; Filipas, Mario 
Subject: Re: TDR and VRAM lost handling in KMD:

On 11.10.2017 10:48, Liu, Monk wrote:

On "guilty": "guilty" is a term that's used by APIs (e.g. OpenGL), so
it's reasonable to use it. However, it /does not/ make sense to mark
idle contexts as "guilty" just because VRAM is lost. VRAM lost is a
perfect example where the driver should report context lost to
applications with the "innocent" flag for contexts that were idle at
the time of reset. The only context(s) that should be reported as "guilty"
(or perhaps "unknown" in some cases) are the ones that were executing
at the time of reset.

ML: KMD mark all contexts as guilty is because that way we can unify
our IOCTL behavior: e.g. for IOCTL only block “guilty”context , no
need to worry about vram-lost-counter anymore, that’s a implementation
style. I don’t think it is related with UMD layer,

For UMD the gl-context isn’t aware of by KMD, so UMD can implement it
own “guilty” gl-context if you want.


Well, to some extent this is just semantics, but it helps to keep the 
terminology consistent.

Most importantly, please keep the AMDGPU_CTX_OP_QUERY_STATE uapi in
mind: this returns one of AMDGPU_CTX_{GUILTY,INNOCENT,UNKNOWN}_RECENT,
and it must return "innocent" for contexts that are only lost due to VRAM lost 
without being otherwise involved in the timeout that lead to the reset.

The point is that in the places where you used "guilty" it would be better to use 
"context lost", and then further differentiate between guilty/innocent context lost based 
on the details of what happened.



If KMD doesn’t mark all ctx as guilty after VRAM lost, can you
illustrate what rule KMD should obey to check in KMS IOCTL like
cs_sumbit ?? let’s see which way better


if (ctx->vram_lost_counter != atomic_read(&adev->vram_lost_counter))
  return -ECANCELED;

Plus similar logic for AMDGPU_CTX_OP_QUERY_STATE.

Yes, it's one additional check in cs_submit. If you're worried about that (and 
Christian's concerns about possible issues with walking over all contexts are 
addressed), I suppose you could just store a per-context

unsigned context_reset_status;

instead of a `bool guilty`. Its value would start out as 0
(AMDGPU_CTX_NO_RESET) and would be set to the correct value during reset.

Cheers,
Nicolai




*From:*Haehnle, Nicolai
*Sent:* Wednesday, October 11, 2017 4:41 PM
*To:* Liu, Monk ; Koenig, Christian
; Olsak, Marek ;
Deucher, Alexander 
*Cc:* amd-gfx@lists.freedesktop.org; Ding, Pixel ;
Jiang, Jerry (SW) ; Li, Bingley
; Ramirez, Alejandro ;
Filipas, Mario 
*Subject:* Re: TDR and VRAM lost handling in KMD:

  From a Mesa perspective, this almost all sounds reasonable to me.

On "guilty": "guilty" is a term that's used by APIs (e.g. OpenGL), so
it's reasonable to use it. However, it /does not/ make sense to mark
idle contexts as "guilty" just because VRAM is lost. VRAM lost is a
perfect example where the driver should report context lost to
applications with the "innocent" flag for contexts that were idle at
the time of reset. The only context(s) that should be reported as "guilty"
(or perhaps "unknown" in some cases) are the ones that were executing
at the time of reset.

On whether the whole context is marked as guilty from a user space
perspective, it would simply be nice for user space to get consistent
answers. It would be a bit odd if we could e.g. succeed in submitting
an SDMA job after a GFX job was rejected. This would point in favor of
marking the entire context as guilty (although that could happen
lazily instead of at reset time). On the other hand, if that's too big
a burden for the kernel implementation I'm sure we can live without it.

Cheers,

Nicolai

-

Re: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Nicolai Hähnle

On 11.10.2017 11:02, Christian König wrote:
1.Kick out all jobs in this “guilty” ctx’s KFIFO queue, and set all 
their fence status to “*ECANCELED*”


Setting ECANCELED should be ok. But I think we should do this when we 
try to run the jobs and not during GPU reset.


[ML] without deep thought and expritment, I’m not sure the difference 
between them, but kick it out in gpu_reset routine is more efficient,


I really don't think so. Kicking them out during gpu_reset sounds racy 
to me once more.


And marking them canceled when we try to run them has the clear 
advantage that all dependencies are meet first.


This makes sense to me as well.

It raises a vaguely related question: What happens to jobs whose 
dependencies were canceled? I believe we currently don't check those 
errors, so we might execute them anyway if their contexts were 
unaffected by the reset. There's a risk that the job will hang due to 
stale data.


I don't think it's a huge risk in practice today because we don't have a 
lot of buffer sharing between applications, but it's something to think 
through at some point. In a way, canceling out of an abundance of 
caution may be a bad idea because it could kill a compositor's task by 
being overly conservative.


Cheers,
Nicolai
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: TDR and VRAM lost handling in KMD:

2017-10-11 Thread Nicolai Hähnle

On 11.10.2017 10:48, Liu, Monk wrote:
On "guilty": "guilty" is a term that's used by APIs (e.g. OpenGL), so 
it's reasonable to use it. However, it /does not/ make sense to mark 
idle contexts as "guilty" just because VRAM is lost. VRAM lost is a 
perfect example where the driver should report context lost to 
applications with the "innocent" flag for contexts that were idle at the 
time of reset. The only context(s) that should be reported as "guilty" 
(or perhaps "unknown" in some cases) are the ones that were executing at 
the time of reset.


ML: KMD mark all contexts as guilty is because that way we can unify our 
IOCTL behavior: e.g. for IOCTL only block “guilty”context , no need to 
worry about vram-lost-counter anymore, that’s a implementation style. I 
don’t think it is related with UMD layer,


For UMD the gl-context isn’t aware of by KMD, so UMD can implement it 
own “guilty” gl-context if you want.


Well, to some extent this is just semantics, but it helps to keep the 
terminology consistent.


Most importantly, please keep the AMDGPU_CTX_OP_QUERY_STATE uapi in 
mind: this returns one of AMDGPU_CTX_{GUILTY,INNOCENT,UNKNOWN}_RECENT, 
and it must return "innocent" for contexts that are only lost due to 
VRAM lost without being otherwise involved in the timeout that lead to 
the reset.


The point is that in the places where you used "guilty" it would be 
better to use "context lost", and then further differentiate between 
guilty/innocent context lost based on the details of what happened.



If KMD doesn’t mark all ctx as guilty after VRAM lost, can you 
illustrate what rule KMD should obey to check in KMS IOCTL like 
cs_sumbit ?? let’s see which way better


if (ctx->vram_lost_counter != atomic_read(&adev->vram_lost_counter))
return -ECANCELED;

Plus similar logic for AMDGPU_CTX_OP_QUERY_STATE.

Yes, it's one additional check in cs_submit. If you're worried about 
that (and Christian's concerns about possible issues with walking over 
all contexts are addressed), I suppose you could just store a per-context


  unsigned context_reset_status;

instead of a `bool guilty`. Its value would start out as 0 
(AMDGPU_CTX_NO_RESET) and would be set to the correct value during reset.


Cheers,
Nicolai




*From:*Haehnle, Nicolai
*Sent:* Wednesday, October 11, 2017 4:41 PM
*To:* Liu, Monk ; Koenig, Christian 
; Olsak, Marek ; Deucher, 
Alexander 
*Cc:* amd-gfx@lists.freedesktop.org; Ding, Pixel ; 
Jiang, Jerry (SW) ; Li, Bingley 
; Ramirez, Alejandro ; 
Filipas, Mario 

*Subject:* Re: TDR and VRAM lost handling in KMD:

 From a Mesa perspective, this almost all sounds reasonable to me.

On "guilty": "guilty" is a term that's used by APIs (e.g. OpenGL), so 
it's reasonable to use it. However, it /does not/ make sense to mark 
idle contexts as "guilty" just because VRAM is lost. VRAM lost is a 
perfect example where the driver should report context lost to 
applications with the "innocent" flag for contexts that were idle at the 
time of reset. The only context(s) that should be reported as "guilty" 
(or perhaps "unknown" in some cases) are the ones that were executing at 
the time of reset.


On whether the whole context is marked as guilty from a user space 
perspective, it would simply be nice for user space to get consistent 
answers. It would be a bit odd if we could e.g. succeed in submitting an 
SDMA job after a GFX job was rejected. This would point in favor of 
marking the entire context as guilty (although that could happen lazily 
instead of at reset time). On the other hand, if that's too big a burden 
for the kernel implementation I'm sure we can live without it.


Cheers,

Nicolai



*From:*Liu, Monk
*Sent:* Wednesday, October 11, 2017 10:15:40 AM
*To:* Koenig, Christian; Haehnle, Nicolai; Olsak, Marek; Deucher, Alexander
*Cc:* amd-gfx@lists.freedesktop.org 
; Ding, Pixel; Jiang, Jerry (SW); 
Li, Bingley; Ramirez, Alejandro; Filipas, Mario

*Subject:* RE: TDR and VRAM lost handling in KMD:

1.Set its fence error status to “*ETIME*”,

No, as I already explained ETIME is for synchronous operation.

In other words when we return ETIME from the wait IOCTL it would mean 
that the waiting has somehow timed out, but not the job we waited for.


Please use ECANCELED as well or some other error code when we find that 
we need to distinct the timedout job from the canceled ones (probably a 
good idea, but I'm not sure).


[ML] I’m okay if you insist not to use ETIME

1.Find the entity/ctx behind this job, and set this ctx as “*guilty*”

Not sure. Do we want to set the whole context as guilty or just the entity?

Setting the whole contexts as guilty sounds racy to me.

BTW: We should use a different name than "guilty", maybe just "bool 
canceled;" ?


[ML] I think context is better than entity, because for example if you 
only block entity_0 of context and allow entity_N run, that means the

Re: [PATCH 09/12] drm/amdgpu/sriov:return -ENODEV if gpu reseted

2017-10-10 Thread Nicolai Hähnle

On 10.10.2017 10:21, Liu, Monk wrote:

As Nicolai described before:


The kernel will reject all command submission from contexts which where created 
before the VRAM lost happened.

ML: this is similar with what my strict mode reset do ☹, except that my logic 
checks if the FD is opened before gpu reset, but Nicolai checks if the context 
created before VRAM lost,
But comparing context creating timing with VRAM lost timing is not accurate 
enough like I said before:
Even you have context created after VRAM lost, that doesn't mean you can allow 
this context submitting jobs, because some BO in the BO_LIST of passed in with 
this context may modified before GPU reset/VRAM lost,
So the safest way is comparing the timing of FD opening

We expose the VRAM lost counter to userspace. When Mesa sees that a command 
submission is rejected it will query VRAM lost counter and declare all 
resources which where in VRAM at this moment as invalidated.
E.g. shader binaries, texture descriptors etc.. will be reuploaded when they 
are used the next time.

The application needs to recreate it's GL context, just in the same way as it 
would if we found this context guilty of causing a reset.


Yes, for most applications. But this is *entirely* unrelated to 
re-opening the FD.


With OpenGL robustness contexts, what happens is that

1. Driver & application detect "context lost"
2. Application destroys the OpenGL context
--> driver destroys the kernel context and all associated buffer objects
3. Application creates a new OpenGL context
--> driver creates a new kernel context and new buffer objects

In this sequence, the content of buffer objects created before the VRAM 
lost is irrelevant because they will all be destroyed, and new command 
submissions will only use "fresh" buffer objects.


(Or, in the case of Mesa, it's actually possible that we re-use buffer 
objects from before VRAM lost due to the caching we do for performance, 
but the contents of those buffer objects will have been completely 
re-initialized.)


The FD simply isn't the right unit of granularity to track this.

Cheers,
Nicolai





-Original Message-
From: Koenig, Christian
Sent: 2017年10月10日 15:26
To: Liu, Monk ; Nicolai Hähnle ; 
amd-gfx@lists.freedesktop.org; Daenzer, Michel 
Subject: Re: [PATCH 09/12] drm/amdgpu/sriov:return -ENODEV if gpu reseted

As Nicolai described before:

The kernel will reject all command submission from contexts which where created 
before the VRAM lost happened.

We expose the VRAM lost counter to userspace. When Mesa sees that a command 
submission is rejected it will query VRAM lost counter and declare all 
resources which where in VRAM at this moment as invalidated.
E.g. shader binaries, texture descriptors etc.. will be reuploaded when they 
are used the next time.

The application needs to recreate it's GL context, just in the same way as it 
would if we found this context guilty of causing a reset.

You should be able to handle this the same way in Vulkan and I think we can 
expose the GPU reset counter to userspace as well. This way you can implement 
the strict mode in userspace and don't need to affect all applications with 
that. In other words the effect will be limited to the Vulkan stack.

Regards,
Christian.

Am 10.10.2017 um 09:12 schrieb Liu, Monk:

Then the question is how we treat recovery if VRAM lost ?

-Original Message-
From: Koenig, Christian
Sent: 2017年10月10日 14:59
To: Liu, Monk ; Nicolai Hähnle ;
amd-gfx@lists.freedesktop.org; Daenzer, Michel

Subject: Re: [PATCH 09/12] drm/amdgpu/sriov:return -ENODEV if gpu
reseted

As Nicolai explained that approach simply won't work.

The fd is used by more than just the closed source Vulkan driver and I think 
even by some components not developed by AMD (common X code?
Michel please comment as well).

So closing it and reopening it to handle a GPU reset is simply not an option.

Regards,
Christian.

Am 10.10.2017 um 06:26 schrieb Liu, Monk:

After VRAM lost happens, all clients no matter radv/mesa/ogl is
useless,

Any drivers uses this FD should be denied by KMD after VRAM lost, and
UMD can destroy/close this FD and re-open it and rebuild all
resources

That's the only option for VRAM lost case



-Original Message-
From: Nicolai Hähnle [mailto:nhaeh...@gmail.com]
Sent: 2017年10月9日 19:01
To: Liu, Monk ; Koenig, Christian
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 09/12] drm/amdgpu/sriov:return -ENODEV if gpu
reseted

On 09.10.2017 10:35, Liu, Monk wrote:

Please be aware that this policy is what the strict mode defined and
what customer want, And also please check VK spec, it defines that
after GPU reset all vk INSTANCE should close/release its
resource/device/ctx and all buffers, and call re-initvkinstance
after gpu reset

Sorry, but you simply cannot implement a correct user-space implementation of 
those specs on top of this.

It will break as soon as you 

Re: [PATCH 4/4] drm/amdgpu: set -ECANCELED when dropping jobs

2017-10-09 Thread Nicolai Hähnle

On 09.10.2017 17:34, Christian König wrote:

From: Christian König 

And return from the wait functions the fence error code.

Signed-off-by: Christian König 


For the series:

Reviewed-by: Nicolai Hähnle 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 7 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 1 +
  2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 359c89c..0185d35 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1283,6 +1283,8 @@ int amdgpu_cs_wait_ioctl(struct drm_device *dev, void 
*data,
r = PTR_ERR(fence);
else if (fence) {
r = dma_fence_wait_timeout(fence, true, timeout);
+   if (r > 0 && fence->error)
+   r = fence->error;
dma_fence_put(fence);
} else
r = 1;
@@ -1420,6 +1422,9 @@ static int amdgpu_cs_wait_all_fences(struct amdgpu_device 
*adev,
  
  		if (r == 0)

break;
+
+   if (fence->error)
+   return fence->error;
}
  
  	memset(wait, 0, sizeof(*wait));

@@ -1480,7 +1485,7 @@ static int amdgpu_cs_wait_any_fence(struct amdgpu_device 
*adev,
wait->out.status = (r > 0);
wait->out.first_signaled = first;
/* set return value 0 to indicate success */
-   r = 0;
+   r = array[first]->error;
  
  err_free_fence_array:

for (i = 0; i < fence_count; i++)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index c76d17c..7067edf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -194,6 +194,7 @@ static struct dma_fence *amdgpu_job_run(struct 
amd_sched_job *sched_job)
trace_amdgpu_sched_run_job(job);
/* skip ib schedule when vram is lost */
if (job->vram_lost_counter != atomic_read(&adev->vram_lost_counter)) {
+   dma_fence_set_error(&job->base.s_fence->finished, -ECANCELED);
DRM_ERROR("Skip scheduling IBs!\n");
} else {
r = amdgpu_ib_schedule(job->ring, job->num_ibs, job->ibs, job,




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: add VRAM lost query

2017-10-09 Thread Nicolai Hähnle

On 09.10.2017 17:53, Christian König wrote:

From: Christian König 

Allows userspace to figure out if VRAM was lost.

Signed-off-by: Christian König 


Yes, I think we can use this in Mesa. We'll need to actually code this 
up, but for now this patch is:


Reviewed-by: Nicolai Hähnle 



---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 +++
  include/uapi/drm/amdgpu_drm.h   | 1 +
  2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 0fc36b2..49cc496 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -762,6 +762,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file
}
return copy_to_user(out, &ui32, min(size, 4u)) ? -EFAULT : 0;
}
+   case AMDGPU_INFO_VRAM_LOST_COUNTER:
+   ui32 = atomic_read(&adev->vram_lost_counter);
+   return copy_to_user(out, &ui32, min(size, 4u)) ? -EFAULT : 0;
default:
DRM_DEBUG_KMS("Invalid request %d\n", info->query);
return -EINVAL;
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 3bf41a6..de0a2ac 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -632,6 +632,7 @@ struct drm_amdgpu_cs_chunk_data {
#define AMDGPU_INFO_SENSOR_VDDGFX   0x7
  /* Number of VRAM page faults on CPU access. */
  #define AMDGPU_INFO_NUM_VRAM_CPU_PAGE_FAULTS  0x1E
+#define AMDGPU_INFO_VRAM_LOST_COUNTER  0x1F
  
  #define AMDGPU_INFO_MMR_SE_INDEX_SHIFT	0

  #define AMDGPU_INFO_MMR_SE_INDEX_MASK 0xff




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: revert VRAM lost handling

2017-10-09 Thread Nicolai Hähnle

On 09.10.2017 10:16, Christian König wrote:

From: Christian König 

Revert "drm/amdgpu: skip all jobs of guilty vm" and
"drm/amdgpu: return -ENODEV to user space when vram is lost v2"

Forcing userspace to restart without a chance to recover in case of a GPU reset
doesn't make much sense and just completely breaks GPU reset handling and makes
the system unuseable after a reset.

Signed-off-by: Christian König 


Acked-by: Nicolai Hähnle 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  4 
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 14 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c|  5 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 15 ---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 10 --
  6 files changed, 5 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 71e971f..81dd5ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -772,7 +772,6 @@ struct amdgpu_fpriv {
struct mutexbo_list_lock;
struct idr  bo_list_handles;
struct amdgpu_ctx_mgr   ctx_mgr;
-   u32 vram_lost_counter;
  };
  
  /*

@@ -1501,7 +1500,6 @@ struct amdgpu_device {
atomic64_t  num_evictions;
atomic64_t  num_vram_cpu_page_faults;
atomic_tgpu_reset_counter;
-   atomic_tvram_lost_counter;
  
  	/* data for buffer migration throttling */

struct {
@@ -1845,8 +1843,6 @@ static inline bool amdgpu_has_atpx(void) { return false; }
  extern const struct drm_ioctl_desc amdgpu_ioctls_kms[];
  extern const int amdgpu_max_kms_ioctl;
  
-bool amdgpu_kms_vram_lost(struct amdgpu_device *adev,

- struct amdgpu_fpriv *fpriv);
  int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags);
  void amdgpu_driver_unload_kms(struct drm_device *dev);
  void amdgpu_driver_lastclose_kms(struct drm_device *dev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index ab83dfc..adb0c1c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1189,7 +1189,6 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
  int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
  {
struct amdgpu_device *adev = dev->dev_private;
-   struct amdgpu_fpriv *fpriv = filp->driver_priv;
union drm_amdgpu_cs *cs = data;
struct amdgpu_cs_parser parser = {};
bool reserved_buffers = false;
@@ -1197,8 +1196,6 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
  
  	if (!adev->accel_working)

return -EBUSY;
-   if (amdgpu_kms_vram_lost(adev, fpriv))
-   return -ENODEV;
  
  	parser.adev = adev;

parser.filp = filp;
@@ -1257,16 +1254,12 @@ int amdgpu_cs_wait_ioctl(struct drm_device *dev, void 
*data,
  {
union drm_amdgpu_wait_cs *wait = data;
struct amdgpu_device *adev = dev->dev_private;
-   struct amdgpu_fpriv *fpriv = filp->driver_priv;
unsigned long timeout = amdgpu_gem_timeout(wait->in.timeout);
struct amdgpu_ring *ring = NULL;
struct amdgpu_ctx *ctx;
struct dma_fence *fence;
long r;
  
-	if (amdgpu_kms_vram_lost(adev, fpriv))

-   return -ENODEV;
-
ctx = amdgpu_ctx_get(filp->driver_priv, wait->in.ctx_id);
if (ctx == NULL)
return -EINVAL;
@@ -1335,16 +1328,12 @@ int amdgpu_cs_fence_to_handle_ioctl(struct drm_device 
*dev, void *data,
struct drm_file *filp)
  {
struct amdgpu_device *adev = dev->dev_private;
-   struct amdgpu_fpriv *fpriv = filp->driver_priv;
union drm_amdgpu_fence_to_handle *info = data;
struct dma_fence *fence;
struct drm_syncobj *syncobj;
struct sync_file *sync_file;
int fd, r;
  
-	if (amdgpu_kms_vram_lost(adev, fpriv))

-   return -ENODEV;
-
fence = amdgpu_cs_get_fence(adev, filp, &info->in.fence);
if (IS_ERR(fence))
return PTR_ERR(fence);
@@ -1506,15 +1495,12 @@ int amdgpu_cs_wait_fences_ioctl(struct drm_device *dev, 
void *data,
struct drm_file *filp)
  {
struct amdgpu_device *adev = dev->dev_private;
-   struct amdgpu_fpriv *fpriv = filp->driver_priv;
union drm_amdgpu_wait_fences *wait = data;
uint32_t fence_count = wait->in.fence_count;
struct drm_amdgpu_fence *fences_user;
struct drm_amdgpu_fence *fences;
int r;
  
-	if (amdgpu_kms_vram_lost(adev, fpriv))

-   return -ENODEV;
/* Get 

Re: [PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini

2017-10-09 Thread Nicolai Hähnle
It depends on what you mean by "handle". If amdgpu_cs_submit_raw were to 
return ECANCELED, the correct error message would be printed.


We don't do any of the "trying to continue" business because back when 
we last discussed that we said that it wasn't such a great idea, and to 
be honest, it really isn't a great idea for normal applications. For the 
X server / compositor it could be valuable though.


Cheers,
Nicolai

On 09.10.2017 15:57, Olsak, Marek wrote:
Mesa does not handle -ECANCELED. It only returns -ECANCELED from the 
Mesa winsys layer if the CS ioctl wasn't called (because the context is 
already lost and so the winsys doesn't submit further CS ioctls).



When the CS ioctl fails for the first time, the kernel error is returned 
and the context is marked as "lost".


The next command submission is automatically dropped by the winsys and 
it returns -ECANCELED.



Marek


*From:* Haehnle, Nicolai
*Sent:* Monday, October 9, 2017 2:58:02 PM
*To:* Koenig, Christian; Liu, Monk; Nicolai Hähnle; 
amd-gfx@lists.freedesktop.org; Olsak, Marek

*Cc:* Li, Bingley
*Subject:* Re: [PATCH 5/5] drm/amd/sched: signal and free remaining 
fences in amd_sched_entity_fini

On 09.10.2017 14:33, Christian König wrote:

Am 09.10.2017 um 13:27 schrieb Nicolai Hähnle:

On 09.10.2017 13:12, Christian König wrote:


Nicolai, how hard would it be to handle ENODEV as failure for all 
currently existing contexts?


Impossible? "All currently existing contexts" is not a well-defined 
concept when multiple drivers co-exist in the same process.


Ok, let me refine the question: I assume there are resources "shared" 
between contexts like binary shader code for example which needs to 
be reuploaded when VRAM is lost.


How hard would it be to handle that correctly?


Okay, that makes more sense :)

With the current interface it's still pretty difficult, but if we 
could get a new per-device query ioctl which returns a "VRAM loss 
counter", it would be reasonably straight-forward.


The problem with the VRAM lost counter is that this isn't save either. 
E.g. you could have an application which uploads shaders, a GPU reset 
happens and VRAM is lost and then the application creates a new context 
and makes submission with broken shader binaries.


Hmm. Here's how I imagined we'd be using a VRAM lost counter:

int si_shader_binary_upload(...)
{
     ...
     shader->bo_vram_lost_counter = sscreen->vram_lost_counter;
     shader->bo = pipe_buffer_create(...);
     ptr = sscreen->b.ws->buffer_map(shader->bo->buf, ...);
     ... copies ...
     sscreen->b.ws->buffer_unmap(shader->bo->buf);
}

int si_shader_select(...)
{
     ...
     r = si_shader_select_with_key(ctx->sscreen, state, ...);
     if (r) return r;

     if (state->current->bo_vram_lost_counter !=
     ctx->sscreen->vram_lost_counter) {
    ... re-upload sequence ...
     }
}

(Not shown: logic that compares ctx->vram_lost_counter with
sscreen->vram_lost_counter and forces a re-validation of all state
including shaders.)

That should cover this scenario, shouldn't it?

Oh... I see one problem. But it should be easy to fix: when creating a
new amdgpu context, Mesa needs to query the vram lost counter. So then
the sequence of events would be either:

- VRAM lost counter starts at 0
- Mesa uploads a shader binary
- Unrelated GPU reset happens, kernel increments VRAM lost counter to 1
- Mesa creates a new amdgpu context, queries the VRAM lost counter --> 1
- si_screen::vram_lost_counter is updated to 1
- Draw happens on the new context --> si_shader_select will catch the
VRAM loss

Or:

- VRAM lost counter starts at 0
- Mesa uploads a shader binary
- Mesa creates a new amdgpu context, VRAM lost counter still 0
- Unrelated GPU reset happens, kernel increments VRAM lost counter to 1
- Draw happens on the new context and proceeds normally
...
- Mesa flushes the CS, and the kernel will return an error code because
the device VRAM lost counter is different from the amdgpu context VRAM
lost counter


So I would still vote for a separate IOCTL to reset the VRAM lost state 
which is called *before" user space starts to reupload 
shader/descriptors etc...


The question is: is that separate IOCTL per-context or per-fd? If it's
per-fd, then it's not compatible with multiple drivers. If it's
per-context, then I don't see how it helps. Perhaps you could explain?


  > This way you also catch the case when another reset happens while you
  > re-upload things.

My assumption would be that the re-upload happens *after* the new amdgpu
context is created. Then the repeat reset should be caught by the kernel
when we try to submit a CS on the new context (this is assuming that the
kernel context's vram lost c

Re: [PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini

2017-10-09 Thread Nicolai Hähnle

On 09.10.2017 13:12, Christian König wrote:


Nicolai, how hard would it be to handle ENODEV as failure for all 
currently existing contexts?


Impossible? "All currently existing contexts" is not a well-defined 
concept when multiple drivers co-exist in the same process.


Ok, let me refine the question: I assume there are resources "shared" 
between contexts like binary shader code for example which needs to be 
reuploaded when VRAM is lost.


How hard would it be to handle that correctly?


Okay, that makes more sense :)

With the current interface it's still pretty difficult, but if we could 
get a new per-device query ioctl which returns a "VRAM loss counter", it 
would be reasonably straight-forward.



And what would be the purpose of this? If it's to support VRAM loss, 
having a per-context VRAM loss counter would enable each context to 
signal ECANCELED separately.


I thought of that on top of the -ENODEV handling.

In other words when we see -ENODEV we call an IOCTL to let the kernel 
know we noticed that something is wrong and then reinit all shared 
resources in userspace.


All existing context will still see -ECANCELED when we drop their 
command submission, but new contexts would at least not cause a new 
lockup immediately because their shader binaries are corrupted.


I don't think we need -ENODEV for this. We just need -ECANCELED to be 
returned when a submission is rejected due to reset (hang or VRAM loss).


Mesa would keep a shadow of the VRAM loss counter in pipe_screen and 
pipe_context, and query the kernel's counter when it encounters 
-ECANCELED. Each context would then know to drop the CS it's built up so 
far and restart based on comparing the VRAM loss counter of pipe_screen 
to that of pipe_context, and similarly we could keep a copy of the VRAM 
loss counter for important buffer objects like shader binaries, 
descriptors, etc.


This seems more robust to me than relying only on an ENODEV. We'd most 
likely keep some kind of VRAM loss counter in Mesa *anyway* (we don't 
maintain a list of all shaders, for example, and we can't nuke important 
per-context across threads), and synthesizing such a counter from 
ENODEVs is not particularly robust (what if multiple ENODEVs occur for 
the same loss event?).


BTW, I still don't like ENODEV. It seems more like the kind of error 
code you'd return with hot-pluggable GPUs where the device can 
physically disappear...


Cheers,
Nicolai




Regards,
Christian.

Am 09.10.2017 um 13:04 schrieb Nicolai Hähnle:

On 09.10.2017 12:59, Christian König wrote:
Nicolai, how hard would it be to handle ENODEV as failure for all 
currently existing contexts?


Impossible? "All currently existing contexts" is not a well-defined 
concept when multiple drivers co-exist in the same process.


And what would be the purpose of this? If it's to support VRAM loss, 
having a per-context VRAM loss counter would enable each context to 
signal ECANCELED separately.


Cheers,
Nicolai




Monk, would it be ok with you when we return ENODEV only for existing 
context when VRAM is lost and/or we have a strict mode GPU reset? 
E.g. newly created contexts would continue work as they should.


Regards,
Christian.

Am 09.10.2017 um 12:49 schrieb Nicolai Hähnle:

Hi Monk,

Yes, you're right, we're only using ECANCELED internally. But as a 
consequence, Mesa would already handle a kernel error of ECANCELED 
on context loss correctly :)


Cheers,
Nicolai

On 09.10.2017 12:35, Liu, Monk wrote:

Hi Christian

You reject some of my patches that returns -ENODEV, with the cause 
that MESA doesn't do the handling on -ENODEV


But if Nicolai can confirm that MESA do have a handling on 
-ECANCELED, then we need to overall align our error code, on detail 
below IOCTL can return error code:


Amdgpu_cs_ioctl
Amdgpu_cs_wait_ioctl
Amdgpu_cs_wait_fences_ioctl
Amdgpu_info_ioctl


My patches is:
return -ENODEV on cs_ioctl if the context is detected guilty,
also return -ENODEV on cs_wait|cs_wait_fences if the fence is 
signaled but with error -ETIME,
also return -ENODEV on info_ioctl so UMD can query if gpu reset 
happened after the process created (because for strict mode we 
block process instead of context)



according to Nicolai:

amdgpu_cs_ioctl *can* return -ECANCELED, but to be frankly 
speaking, kernel part doesn't have any place with "-ECANCELED" so 
this solution on MESA side doesn't align with *current* amdgpu driver,
which only return 0 on success or -EINVALID on other error but 
definitely no "-ECANCELED" error code,


so if we talking about community rules we shouldn't let MESA handle 
-ECANCELED ,  we should have a unified error code


+ Marek

BR Monk




-Original Message-
From: Haehnle, Nicolai
Sent: 2017年10月9日 18:14
To: Koenig, Christian ; Liu, Monk 
; Nicolai Hähnle ; 
amd-gfx@lists.freedesktop.org
Subject: Re: [

Re: [PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini

2017-10-09 Thread Nicolai Hähnle

On 09.10.2017 14:33, Christian König wrote:

Am 09.10.2017 um 13:27 schrieb Nicolai Hähnle:

On 09.10.2017 13:12, Christian König wrote:


Nicolai, how hard would it be to handle ENODEV as failure for all 
currently existing contexts?


Impossible? "All currently existing contexts" is not a well-defined 
concept when multiple drivers co-exist in the same process.


Ok, let me refine the question: I assume there are resources "shared" 
between contexts like binary shader code for example which needs to 
be reuploaded when VRAM is lost.


How hard would it be to handle that correctly?


Okay, that makes more sense :)

With the current interface it's still pretty difficult, but if we 
could get a new per-device query ioctl which returns a "VRAM loss 
counter", it would be reasonably straight-forward.


The problem with the VRAM lost counter is that this isn't save either. 
E.g. you could have an application which uploads shaders, a GPU reset 
happens and VRAM is lost and then the application creates a new context 
and makes submission with broken shader binaries.


Hmm. Here's how I imagined we'd be using a VRAM lost counter:

int si_shader_binary_upload(...)
{
   ...
   shader->bo_vram_lost_counter = sscreen->vram_lost_counter;
   shader->bo = pipe_buffer_create(...);
   ptr = sscreen->b.ws->buffer_map(shader->bo->buf, ...);
   ... copies ...
   sscreen->b.ws->buffer_unmap(shader->bo->buf);
}

int si_shader_select(...)
{
   ...
   r = si_shader_select_with_key(ctx->sscreen, state, ...);
   if (r) return r;

   if (state->current->bo_vram_lost_counter !=
   ctx->sscreen->vram_lost_counter) {
  ... re-upload sequence ...
   }
}

(Not shown: logic that compares ctx->vram_lost_counter with 
sscreen->vram_lost_counter and forces a re-validation of all state 
including shaders.)


That should cover this scenario, shouldn't it?

Oh... I see one problem. But it should be easy to fix: when creating a 
new amdgpu context, Mesa needs to query the vram lost counter. So then 
the sequence of events would be either:


- VRAM lost counter starts at 0
- Mesa uploads a shader binary
- Unrelated GPU reset happens, kernel increments VRAM lost counter to 1
- Mesa creates a new amdgpu context, queries the VRAM lost counter --> 1
- si_screen::vram_lost_counter is updated to 1
- Draw happens on the new context --> si_shader_select will catch the 
VRAM loss


Or:

- VRAM lost counter starts at 0
- Mesa uploads a shader binary
- Mesa creates a new amdgpu context, VRAM lost counter still 0
- Unrelated GPU reset happens, kernel increments VRAM lost counter to 1
- Draw happens on the new context and proceeds normally
...
- Mesa flushes the CS, and the kernel will return an error code because 
the device VRAM lost counter is different from the amdgpu context VRAM 
lost counter



So I would still vote for a separate IOCTL to reset the VRAM lost state 
which is called *before" user space starts to reupload 
shader/descriptors etc...


The question is: is that separate IOCTL per-context or per-fd? If it's 
per-fd, then it's not compatible with multiple drivers. If it's 
per-context, then I don't see how it helps. Perhaps you could explain?



> This way you also catch the case when another reset happens while you
> re-upload things.

My assumption would be that the re-upload happens *after* the new amdgpu 
context is created. Then the repeat reset should be caught by the kernel 
when we try to submit a CS on the new context (this is assuming that the 
kernel context's vram lost counter is initialized properly when the 
context is created):


- Mesa prepares upload, sets shader->bo_vram_lost_counter to 0
- Mesa uploads a shader binary
- While doing this, a GPU reset happens[0], kernel increments device 
VRAM lost counter to 1

- Draw happens with the new shader, Mesa proceeds normally
...
- On flush / CS submit, the kernel detects the VRAM lost state and 
returns an error to Mesa


[0] Out of curiosity: What happens on the CPU side if the PCI / full 
ASIC reset method is used? Is there a time window where we could get a SEGV?



[snip]
BTW, I still don't like ENODEV. It seems more like the kind of error 
code you'd return with hot-pluggable GPUs where the device can 
physically disappear...


Yeah, ECANCELED sounds like a better alternative. But I think we should 
still somehow note the fatality of loosing VRAM to userspace.


How about ENODATA or EBADFD?


According to the manpage, EBADFD is "File descriptor in bad state.". 
Sounds fitting :)


Cheers,
Nicolai




Regards,
Christian.



Cheers,
Nicolai




Regards,
Christian.

Am 09.10.2017 um 13:04 schrieb Nicolai Hähnle:

On 09.10.2017 12:59, Christian König wrote:
Nicolai, how hard would it be to handle ENODEV as failure for all 
currently existing contexts?


Impossible? "All currently existin

Re: [PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini

2017-10-09 Thread Nicolai Hähnle

On 09.10.2017 12:59, Christian König wrote:
Nicolai, how hard would it be to handle ENODEV as failure for all 
currently existing contexts?


Impossible? "All currently existing contexts" is not a well-defined 
concept when multiple drivers co-exist in the same process.


And what would be the purpose of this? If it's to support VRAM loss, 
having a per-context VRAM loss counter would enable each context to 
signal ECANCELED separately.


Cheers,
Nicolai




Monk, would it be ok with you when we return ENODEV only for existing 
context when VRAM is lost and/or we have a strict mode GPU reset? E.g. 
newly created contexts would continue work as they should.


Regards,
Christian.

Am 09.10.2017 um 12:49 schrieb Nicolai Hähnle:

Hi Monk,

Yes, you're right, we're only using ECANCELED internally. But as a 
consequence, Mesa would already handle a kernel error of ECANCELED on 
context loss correctly :)


Cheers,
Nicolai

On 09.10.2017 12:35, Liu, Monk wrote:

Hi Christian

You reject some of my patches that returns -ENODEV, with the cause 
that MESA doesn't do the handling on -ENODEV


But if Nicolai can confirm that MESA do have a handling on 
-ECANCELED, then we need to overall align our error code, on detail 
below IOCTL can return error code:


Amdgpu_cs_ioctl
Amdgpu_cs_wait_ioctl
Amdgpu_cs_wait_fences_ioctl
Amdgpu_info_ioctl


My patches is:
return -ENODEV on cs_ioctl if the context is detected guilty,
also return -ENODEV on cs_wait|cs_wait_fences if the fence is 
signaled but with error -ETIME,
also return -ENODEV on info_ioctl so UMD can query if gpu reset 
happened after the process created (because for strict mode we block 
process instead of context)



according to Nicolai:

amdgpu_cs_ioctl *can* return -ECANCELED, but to be frankly speaking, 
kernel part doesn't have any place with "-ECANCELED" so this solution 
on MESA side doesn't align with *current* amdgpu driver,
which only return 0 on success or -EINVALID on other error but 
definitely no "-ECANCELED" error code,


so if we talking about community rules we shouldn't let MESA handle 
-ECANCELED ,  we should have a unified error code


+ Marek

BR Monk




-Original Message-
From: Haehnle, Nicolai
Sent: 2017年10月9日 18:14
To: Koenig, Christian ; Liu, Monk 
; Nicolai Hähnle ; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 5/5] drm/amd/sched: signal and free remaining 
fences in amd_sched_entity_fini


On 09.10.2017 10:02, Christian König wrote:

For gpu reset patches (already submitted to pub) I would make kernel
return -ENODEV if the waiting fence (in cs_wait or wait_fences IOCTL)
founded as error, that way UMD would run into robust extension path
and considering the GPU hang occurred,

Well that is only closed source behavior which is completely
irrelevant for upstream development.

As far as I know we haven't pushed the change to return -ENODEV 
upstream.


FWIW, radeonsi currently expects -ECANCELED on CS submissions and 
treats those as context lost. Perhaps we could use the same error on 
fences?

That makes more sense to me than -ENODEV.

Cheers,
Nicolai



Regards,
Christian.

Am 09.10.2017 um 08:42 schrieb Liu, Monk:

Christian


It would be really nice to have an error code set on
s_fence->finished before it is signaled, use dma_fence_set_error()
for this.

For gpu reset patches (already submitted to pub) I would make kernel
return -ENODEV if the waiting fence (in cs_wait or wait_fences IOCTL)
founded as error, that way UMD would run into robust extension path
and considering the GPU hang occurred,

Don't know if this is expected for the case of normal process being
killed or crashed like Nicolai hit ... since there is no gpu hang hit


BR Monk




-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On
Behalf Of Christian K?nig
Sent: 2017年9月28日 23:01
To: Nicolai Hähnle ;
amd-gfx@lists.freedesktop.org
Cc: Haehnle, Nicolai 
Subject: Re: [PATCH 5/5] drm/amd/sched: signal and free remaining
fences in amd_sched_entity_fini

Am 28.09.2017 um 16:55 schrieb Nicolai Hähnle:

From: Nicolai Hähnle 

Highly concurrent Piglit runs can trigger a race condition where a
pending SDMA job on a buffer object is never executed because the
corresponding process is killed (perhaps due to a crash). Since the
job's fences were never signaled, the buffer object was effectively
leaked. Worse, the buffer was stuck wherever it happened to be at
the time, possibly in VRAM.

The symptom was user space processes stuck in interruptible waits
with kernel stacks like:

   [] dma_fence_default_wait+0x112/0x250
   [] dma_fence_wait_timeout+0x39/0xf0
   []
reservation_object_wait_timeout_rcu+0x1c2/0x300
   [] ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0
[ttm]
   [] ttm_mem_evict_first+0xba/0x1a0 [ttm]
   [] ttm_bo_mem_space+0x341/0x4c0 [ttm]
   [] ttm_bo_validate+0xd4/0x150 [ttm]
   [] ttm_bo_init_r

Re: [PATCH 09/12] drm/amdgpu/sriov:return -ENODEV if gpu reseted

2017-10-09 Thread Nicolai Hähnle

On 09.10.2017 10:35, Liu, Monk wrote:

Please be aware that this policy is what the strict mode defined and what 
customer want,
And also please check VK spec, it defines that after GPU reset all vk INSTANCE 
should close/release its resource/device/ctx and all buffers, and call 
re-initvkinstance after gpu reset


Sorry, but you simply cannot implement a correct user-space 
implementation of those specs on top of this.


It will break as soon as you have both OpenGL and Vulkan running in the 
same process (or heck, our Vulkan and radv :)), because both drivers 
will use the same fd.


Cheers,
Nicolai




So this whole approach is what just aligned with the spec, and to not influence 
with current MESA/OGL client that's why I put the whole approach into the 
strict mode
And by default strict mode is not selected


BR Monk

-Original Message-
From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com]
Sent: 2017年10月9日 16:26
To: Liu, Monk ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 09/12] drm/amdgpu/sriov:return -ENODEV if gpu reseted

Am 30.09.2017 um 08:03 schrieb Monk Liu:

for SRIOV strict mode gpu reset:

In kms open we mark the latest adev->gpu_reset_counter in fpriv we
return -ENODEV in cs_ioctl or info_ioctl if they found
fpriv->gpu_reset_counter != adev->gpu_reset_counter.

this way we prevent a potential bad process/FD from submitting cmds
and notify userspace with -ENODEV.

userspace should close all BO/ctx and re-open dri FD to re-create
virtual memory system for this process


The whole aproach is a NAK from my side.

We need to enable userspace to continue, not force it into process termination 
to recover. Otherwise we could send a SIGTERM in the first place.

Regards,
Christian.



Change-Id: Ib4c179f28a3d0783837566f29de07fc14aa9b9a4
Signed-off-by: Monk Liu 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 5 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 7 +++
   3 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index de9c164..b40d4ba 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -772,6 +772,7 @@ struct amdgpu_fpriv {
struct idr  bo_list_handles;
struct amdgpu_ctx_mgr   ctx_mgr;
u32 vram_lost_counter;
+   int gpu_reset_counter;
   };
   
   /*

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 9467cf6..6a1515e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1199,6 +1199,11 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
if (amdgpu_kms_vram_lost(adev, fpriv))
return -ENODEV;
   
+	if (amdgpu_sriov_vf(adev) &&

+   amdgpu_sriov_reset_level == 1 &&
+   fpriv->gpu_reset_counter < 
atomic_read(&adev->gpu_reset_counter))
+   return -ENODEV;
+
parser.adev = adev;
parser.filp = filp;
   
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 282f45b..bd389cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -285,6 +285,11 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file
if (amdgpu_kms_vram_lost(adev, fpriv))
return -ENODEV;
   
+	if (amdgpu_sriov_vf(adev) &&

+   amdgpu_sriov_reset_level == 1 &&
+   fpriv->gpu_reset_counter < 
atomic_read(&adev->gpu_reset_counter))
+   return -ENODEV;
+
switch (info->query) {
case AMDGPU_INFO_ACCEL_WORKING:
ui32 = adev->accel_working;
@@ -824,6 +829,8 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct 
drm_file *file_priv)
goto out_suspend;
}
   
+	fpriv->gpu_reset_counter = atomic_read(&adev->gpu_reset_counter);

+
r = amdgpu_vm_init(adev, &fpriv->vm,
   AMDGPU_VM_CONTEXT_GFX, 0);
if (r) {



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 07/12] drm/amdgpu/sriov:implement strict gpu reset

2017-10-09 Thread Nicolai Hähnle

On 30.09.2017 08:03, Monk Liu wrote:

changes:
1)implement strict mode sriov gpu reset
2)always call sriov_gpu_reset_strict if hypervisor notify FLR
3)in strict reset mode, set error to all fences.
4)change fence_wait/cs_wait functions to return -ENODEV if fence signaled
with error == -ETIME,

Since after strict gpu reset we consider the VRAM were lost,
and since assuming VRAM lost there is little help to recover
shadow BO because all textures/resources/shaders cannot
recovered (if they resident in VRAM)

Change-Id: I50d9b8b5185ba92f137f07c9deeac19d740d753b
Signed-off-by: Monk Liu 
---

[snip]

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 9efbb33..122e2e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2734,6 +2734,96 @@ static int amdgpu_recover_vram_from_shadow(struct 
amdgpu_device *adev,
  }
  
  /**

+ * amdgpu_sriov_gpu_reset_strict - reset the asic under strict mode
+ *
+ * @adev: amdgpu device pointer
+ * @job: which job trigger hang
+ *
+ * Attempt the reset the GPU if it has hung (all asics).
+ * for SRIOV case.
+ * Returns 0 for success or an error on failure.
+ *
+ * this function will deny all process/fence created before this reset,
+ * and drop all jobs unfinished during this reset.
+ *
+ * Application should take the responsibility to re-open the FD to re-create
+ * the VM page table and recover all resources as well


Total NAK to this. It is *completely* infeasible from the UMD side, 
because multiple drivers can simultaneously use the same FD.


The KMD should just drop all previously submitted jobs and let the UMD 
worry about whether it wants to re-use buffer objects or not.


The VM page table can then be rebuilt transparently based on whatever BO 
lists are used as new submissions are made after the reset.


Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini

2017-10-09 Thread Nicolai Hähnle

Hi Monk,

Yes, you're right, we're only using ECANCELED internally. But as a 
consequence, Mesa would already handle a kernel error of ECANCELED on 
context loss correctly :)


Cheers,
Nicolai

On 09.10.2017 12:35, Liu, Monk wrote:

Hi Christian

You reject some of my patches that returns -ENODEV, with the cause that MESA 
doesn't do the handling on -ENODEV

But if Nicolai can confirm that MESA do have a handling on -ECANCELED, then we 
need to overall align our error code, on detail below IOCTL can return error 
code:

Amdgpu_cs_ioctl
Amdgpu_cs_wait_ioctl
Amdgpu_cs_wait_fences_ioctl
Amdgpu_info_ioctl


My patches is:
return -ENODEV on cs_ioctl if the context is detected guilty,
also return -ENODEV on cs_wait|cs_wait_fences if the fence is signaled but with 
error -ETIME,
also return -ENODEV on info_ioctl so UMD can query if gpu reset happened after 
the process created (because for strict mode we block process instead of 
context)


according to Nicolai:

amdgpu_cs_ioctl *can* return -ECANCELED, but to be frankly speaking, kernel part doesn't 
have any place with "-ECANCELED" so this solution on MESA side doesn't align 
with *current* amdgpu driver,
which only return 0 on success or -EINVALID on other error but definitely no 
"-ECANCELED" error code,

so if we talking about community rules we shouldn't let MESA handle -ECANCELED 
,  we should have a unified error code

+ Marek

BR Monk

  




-Original Message-
From: Haehnle, Nicolai
Sent: 2017年10月9日 18:14
To: Koenig, Christian ; Liu, Monk ; 
Nicolai Hähnle ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 5/5] drm/amd/sched: signal and free remaining fences in 
amd_sched_entity_fini

On 09.10.2017 10:02, Christian König wrote:

For gpu reset patches (already submitted to pub) I would make kernel
return -ENODEV if the waiting fence (in cs_wait or wait_fences IOCTL)
founded as error, that way UMD would run into robust extension path
and considering the GPU hang occurred,

Well that is only closed source behavior which is completely
irrelevant for upstream development.

As far as I know we haven't pushed the change to return -ENODEV upstream.


FWIW, radeonsi currently expects -ECANCELED on CS submissions and treats those 
as context lost. Perhaps we could use the same error on fences?
That makes more sense to me than -ENODEV.

Cheers,
Nicolai



Regards,
Christian.

Am 09.10.2017 um 08:42 schrieb Liu, Monk:

Christian


It would be really nice to have an error code set on
s_fence->finished before it is signaled, use dma_fence_set_error()
for this.

For gpu reset patches (already submitted to pub) I would make kernel
return -ENODEV if the waiting fence (in cs_wait or wait_fences IOCTL)
founded as error, that way UMD would run into robust extension path
and considering the GPU hang occurred,

Don't know if this is expected for the case of normal process being
killed or crashed like Nicolai hit ... since there is no gpu hang hit


BR Monk




-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On
Behalf Of Christian K?nig
Sent: 2017年9月28日 23:01
To: Nicolai Hähnle ;
amd-gfx@lists.freedesktop.org
Cc: Haehnle, Nicolai 
Subject: Re: [PATCH 5/5] drm/amd/sched: signal and free remaining
fences in amd_sched_entity_fini

Am 28.09.2017 um 16:55 schrieb Nicolai Hähnle:

From: Nicolai Hähnle 

Highly concurrent Piglit runs can trigger a race condition where a
pending SDMA job on a buffer object is never executed because the
corresponding process is killed (perhaps due to a crash). Since the
job's fences were never signaled, the buffer object was effectively
leaked. Worse, the buffer was stuck wherever it happened to be at
the time, possibly in VRAM.

The symptom was user space processes stuck in interruptible waits
with kernel stacks like:

   [] dma_fence_default_wait+0x112/0x250
   [] dma_fence_wait_timeout+0x39/0xf0
   []
reservation_object_wait_timeout_rcu+0x1c2/0x300
   [] ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0
[ttm]
   [] ttm_mem_evict_first+0xba/0x1a0 [ttm]
   [] ttm_bo_mem_space+0x341/0x4c0 [ttm]
   [] ttm_bo_validate+0xd4/0x150 [ttm]
   [] ttm_bo_init_reserved+0x2ed/0x420 [ttm]
   [] amdgpu_bo_create_restricted+0x1f3/0x470
[amdgpu]
   [] amdgpu_bo_create+0xda/0x220 [amdgpu]
   [] amdgpu_gem_object_create+0xaa/0x140
[amdgpu]
   [] amdgpu_gem_create_ioctl+0x97/0x120
[amdgpu]
   [] drm_ioctl+0x1fa/0x480 [drm]
   [] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
   [] do_vfs_ioctl+0xa3/0x5f0
   [] SyS_ioctl+0x79/0x90
   [] entry_SYSCALL_64_fastpath+0x1e/0xad
   [] 0x

Signed-off-by: Nicolai Hähnle 
Acked-by: Christian König 
---
    drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 7 ++-
    1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
index 54eb77cffd9b..32a

Re: [PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini

2017-10-09 Thread Nicolai Hähnle

On 09.10.2017 10:02, Christian König wrote:
For gpu reset patches (already submitted to pub) I would make kernel 
return -ENODEV if the waiting fence (in cs_wait or wait_fences IOCTL) 
founded as error, that way UMD would run into robust extension path 
and considering the GPU hang occurred,
Well that is only closed source behavior which is completely irrelevant 
for upstream development.


As far as I know we haven't pushed the change to return -ENODEV upstream.


FWIW, radeonsi currently expects -ECANCELED on CS submissions and treats 
those as context lost. Perhaps we could use the same error on fences? 
That makes more sense to me than -ENODEV.


Cheers,
Nicolai



Regards,
Christian.

Am 09.10.2017 um 08:42 schrieb Liu, Monk:

Christian

It would be really nice to have an error code set on 
s_fence->finished before it is signaled, use dma_fence_set_error() 
for this.
For gpu reset patches (already submitted to pub) I would make kernel 
return -ENODEV if the waiting fence (in cs_wait or wait_fences IOCTL) 
founded as error, that way UMD would run into robust extension path 
and considering the GPU hang occurred,


Don't know if this is expected for the case of normal process being 
killed or crashed like Nicolai hit ... since there is no gpu hang hit



BR Monk




-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf 
Of Christian K?nig

Sent: 2017年9月28日 23:01
To: Nicolai Hähnle ; amd-gfx@lists.freedesktop.org
Cc: Haehnle, Nicolai 
Subject: Re: [PATCH 5/5] drm/amd/sched: signal and free remaining 
fences in amd_sched_entity_fini


Am 28.09.2017 um 16:55 schrieb Nicolai Hähnle:

From: Nicolai Hähnle 

Highly concurrent Piglit runs can trigger a race condition where a
pending SDMA job on a buffer object is never executed because the
corresponding process is killed (perhaps due to a crash). Since the
job's fences were never signaled, the buffer object was effectively
leaked. Worse, the buffer was stuck wherever it happened to be at the 
time, possibly in VRAM.


The symptom was user space processes stuck in interruptible waits with
kernel stacks like:

  [] dma_fence_default_wait+0x112/0x250
  [] dma_fence_wait_timeout+0x39/0xf0
  [] 
reservation_object_wait_timeout_rcu+0x1c2/0x300
  [] ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0 
[ttm]

  [] ttm_mem_evict_first+0xba/0x1a0 [ttm]
  [] ttm_bo_mem_space+0x341/0x4c0 [ttm]
  [] ttm_bo_validate+0xd4/0x150 [ttm]
  [] ttm_bo_init_reserved+0x2ed/0x420 [ttm]
  [] amdgpu_bo_create_restricted+0x1f3/0x470 
[amdgpu]

  [] amdgpu_bo_create+0xda/0x220 [amdgpu]
  [] amdgpu_gem_object_create+0xaa/0x140 [amdgpu]
  [] amdgpu_gem_create_ioctl+0x97/0x120 [amdgpu]
  [] drm_ioctl+0x1fa/0x480 [drm]
  [] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
  [] do_vfs_ioctl+0xa3/0x5f0
  [] SyS_ioctl+0x79/0x90
  [] entry_SYSCALL_64_fastpath+0x1e/0xad
  [] 0xffff

Signed-off-by: Nicolai Hähnle 
Acked-by: Christian König 
---
   drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 7 ++-
   1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
index 54eb77cffd9b..32a99e980d78 100644
--- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
@@ -220,22 +220,27 @@ void amd_sched_entity_fini(struct 
amd_gpu_scheduler *sched,

   amd_sched_entity_is_idle(entity));
   amd_sched_rq_remove_entity(rq, entity);
   if (r) {
   struct amd_sched_job *job;
   /* Park the kernel for a moment to make sure it isn't 
processing

    * our enity.
    */
   kthread_park(sched->thread);
   kthread_unpark(sched->thread);
-    while (kfifo_out(&entity->job_queue, &job, sizeof(job)))
+    while (kfifo_out(&entity->job_queue, &job, sizeof(job))) {
+    struct amd_sched_fence *s_fence = job->s_fence;
+    amd_sched_fence_scheduled(s_fence);
It would be really nice to have an error code set on s_fence->finished 
before it is signaled, use dma_fence_set_error() for this.


Additional to that it would be nice to note in the subject line that 
this is a rather important bug fix.


With that fixed the whole series is Reviewed-by: Christian König 
.


Regards,
Christian.


+    amd_sched_fence_finished(s_fence);
+    dma_fence_put(&s_fence->finished);
   sched->ops->free_job(job);
+    }
   }
   kfifo_free(&entity->job_queue);
   }
   static void amd_sched_entity_wakeup(struct dma_fence *f, struct 
dma_fence_cb *cb)

   {
   struct amd_sched_entity *entity =
   container_of(cb, struct amd_sched_entity, cb);
   entity->dependency = NULL;


___
amd-gfx mailing list
amd-

Re: [PATCH 4/5] drm/amd/sched: NULL out the s_fence field after run_job

2017-09-28 Thread Nicolai Hähnle

On 28.09.2017 20:39, Andres Rodriguez wrote:



On 2017-09-28 10:55 AM, Nicolai Hähnle wrote:

From: Nicolai Hähnle 

amd_sched_process_job drops the fence reference, so NULL out the s_fence
field before adding it as a callback to guard against accidentally using
s_fence after it may have be freed.

Signed-off-by: Nicolai Hähnle 
Acked-by: Christian König 
---
  drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c

index e793312e351c..54eb77cffd9b 100644
--- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
@@ -604,20 +604,23 @@ static int amd_sched_main(void *param)
  if (!sched_job)
  continue;
  s_fence = sched_job->s_fence;
  atomic_inc(&sched->hw_rq_count);
  amd_sched_job_begin(sched_job);
  fence = sched->ops->run_job(sched_job);
  amd_sched_fence_scheduled(s_fence);
+
+    sched_job->s_fence = NULL;


Minor optional nitpick here. Could this be moved somewhere closer to 
where the fence reference is actually dropped? Alternatively, could a 
comment be added to specify which function call results in the reference 
ownership transfer?


Sure, I can add a comment. (It's amd_sched_process_job, which is called 
directly or indirectly in all the branches of the following if-statement.)




Whether a change is made or not, this series is
Reviewed-by: Andres Rodriguez 


Thanks.


Currently running piglit to check if this fixes the occasional soft 
hangs I was getting where all tests complete except one.


You may be running into this Mesa issue:

https://patchwork.freedesktop.org/patch/179535/

Cheers,
Nicolai





+
  if (fence) {
  s_fence->parent = dma_fence_get(fence);
  r = dma_fence_add_callback(fence, &s_fence->cb,
 amd_sched_process_job);
  if (r == -ENOENT)
  amd_sched_process_job(fence, &s_fence->cb);
  else if (r)
  DRM_ERROR("fence add callback failed (%d)\n",
    r);
  dma_fence_put(fence);



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/5] drm/amd/sched: fix an outdated comment

2017-09-28 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Signed-off-by: Nicolai Hähnle 
---
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
index 742d724cd720..6e899c593b7e 100644
--- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
@@ -347,22 +347,21 @@ static bool amd_sched_entity_in(struct amd_sched_job 
*sched_job)
 
/* first job wakes up scheduler */
if (first) {
/* Add the entity to the run queue */
amd_sched_rq_add_entity(entity->rq, entity);
amd_sched_wakeup(sched);
}
return added;
 }
 
-/* job_finish is called after hw fence signaled, and
- * the job had already been deleted from ring_mirror_list
+/* job_finish is called after hw fence signaled
  */
 static void amd_sched_job_finish(struct work_struct *work)
 {
struct amd_sched_job *s_job = container_of(work, struct amd_sched_job,
   finish_work);
struct amd_gpu_scheduler *sched = s_job->sched;
 
/* remove job from ring_mirror_list */
spin_lock(&sched->job_list_lock);
list_del_init(&s_job->node);
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 3/5] drm/amd/sched: move adding finish callback to amd_sched_job_begin

2017-09-28 Thread Nicolai Hähnle
From: Nicolai Hähnle 

The finish callback is responsible for removing the job from the ring
mirror list, among other things. It makes sense to add it as callback
in the place where the job is added to the ring mirror list.

Signed-off-by: Nicolai Hähnle 
Acked-by: Christian König 
---
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
index 6e899c593b7e..e793312e351c 100644
--- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
@@ -388,20 +388,23 @@ static void amd_sched_job_finish_cb(struct dma_fence *f,
 {
struct amd_sched_job *job = container_of(cb, struct amd_sched_job,
 finish_cb);
schedule_work(&job->finish_work);
 }
 
 static void amd_sched_job_begin(struct amd_sched_job *s_job)
 {
struct amd_gpu_scheduler *sched = s_job->sched;
 
+   dma_fence_add_callback(&s_job->s_fence->finished, &s_job->finish_cb,
+  amd_sched_job_finish_cb);
+
spin_lock(&sched->job_list_lock);
list_add_tail(&s_job->node, &sched->ring_mirror_list);
if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
list_first_entry_or_null(&sched->ring_mirror_list,
 struct amd_sched_job, node) == s_job)
schedule_delayed_work(&s_job->work_tdr, sched->timeout);
spin_unlock(&sched->job_list_lock);
 }
 
 static void amd_sched_job_timedout(struct work_struct *work)
@@ -480,22 +483,20 @@ void amd_sched_job_recovery(struct amd_gpu_scheduler 
*sched)
  *
  * @sched_job  The pointer to job required to submit
  *
  * Returns 0 for success, negative error code otherwise.
  */
 void amd_sched_entity_push_job(struct amd_sched_job *sched_job)
 {
struct amd_sched_entity *entity = sched_job->s_entity;
 
trace_amd_sched_job(sched_job);
-   dma_fence_add_callback(&sched_job->s_fence->finished, 
&sched_job->finish_cb,
-  amd_sched_job_finish_cb);
wait_event(entity->sched->job_scheduled,
   amd_sched_entity_in(sched_job));
 }
 
 /* init a sched_job with basic field */
 int amd_sched_job_init(struct amd_sched_job *job,
   struct amd_gpu_scheduler *sched,
   struct amd_sched_entity *entity,
   void *owner)
 {
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/5] drm/amd/sched: rename amd_sched_entity_pop_job

2017-09-28 Thread Nicolai Hähnle
From: Nicolai Hähnle 

The function does not actually remove the job from the FIFO, so "peek"
describes it better.

Signed-off-by: Nicolai Hähnle 
---
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
index 97c94f9683fa..742d724cd720 100644
--- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
@@ -301,21 +301,21 @@ static bool amd_sched_entity_add_dependency_cb(struct 
amd_sched_entity *entity)
 
if (!dma_fence_add_callback(entity->dependency, &entity->cb,
amd_sched_entity_wakeup))
return true;
 
dma_fence_put(entity->dependency);
return false;
 }
 
 static struct amd_sched_job *
-amd_sched_entity_pop_job(struct amd_sched_entity *entity)
+amd_sched_entity_peek_job(struct amd_sched_entity *entity)
 {
struct amd_gpu_scheduler *sched = entity->sched;
struct amd_sched_job *sched_job;
 
if (!kfifo_out_peek(&entity->job_queue, &sched_job, sizeof(sched_job)))
return NULL;
 
while ((entity->dependency = sched->ops->dependency(sched_job)))
if (amd_sched_entity_add_dependency_cb(entity))
return NULL;
@@ -593,21 +593,21 @@ static int amd_sched_main(void *param)
struct dma_fence *fence;
 
wait_event_interruptible(sched->wake_up_worker,
 (!amd_sched_blocked(sched) &&
  (entity = 
amd_sched_select_entity(sched))) ||
 kthread_should_stop());
 
if (!entity)
continue;
 
-   sched_job = amd_sched_entity_pop_job(entity);
+   sched_job = amd_sched_entity_peek_job(entity);
if (!sched_job)
continue;
 
s_fence = sched_job->s_fence;
 
atomic_inc(&sched->hw_rq_count);
amd_sched_job_begin(sched_job);
 
fence = sched->ops->run_job(sched_job);
amd_sched_fence_scheduled(s_fence);
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 4/5] drm/amd/sched: NULL out the s_fence field after run_job

2017-09-28 Thread Nicolai Hähnle
From: Nicolai Hähnle 

amd_sched_process_job drops the fence reference, so NULL out the s_fence
field before adding it as a callback to guard against accidentally using
s_fence after it may have be freed.

Signed-off-by: Nicolai Hähnle 
Acked-by: Christian König 
---
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
index e793312e351c..54eb77cffd9b 100644
--- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
@@ -604,20 +604,23 @@ static int amd_sched_main(void *param)
if (!sched_job)
continue;
 
s_fence = sched_job->s_fence;
 
atomic_inc(&sched->hw_rq_count);
amd_sched_job_begin(sched_job);
 
fence = sched->ops->run_job(sched_job);
amd_sched_fence_scheduled(s_fence);
+
+   sched_job->s_fence = NULL;
+
if (fence) {
s_fence->parent = dma_fence_get(fence);
r = dma_fence_add_callback(fence, &s_fence->cb,
   amd_sched_process_job);
if (r == -ENOENT)
amd_sched_process_job(fence, &s_fence->cb);
else if (r)
DRM_ERROR("fence add callback failed (%d)\n",
  r);
dma_fence_put(fence);
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 5/5] drm/amd/sched: signal and free remaining fences in amd_sched_entity_fini

2017-09-28 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Highly concurrent Piglit runs can trigger a race condition where a pending
SDMA job on a buffer object is never executed because the corresponding
process is killed (perhaps due to a crash). Since the job's fences were
never signaled, the buffer object was effectively leaked. Worse, the
buffer was stuck wherever it happened to be at the time, possibly in VRAM.

The symptom was user space processes stuck in interruptible waits with
kernel stacks like:

[] dma_fence_default_wait+0x112/0x250
[] dma_fence_wait_timeout+0x39/0xf0
[] reservation_object_wait_timeout_rcu+0x1c2/0x300
[] ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0 [ttm]
[] ttm_mem_evict_first+0xba/0x1a0 [ttm]
[] ttm_bo_mem_space+0x341/0x4c0 [ttm]
[] ttm_bo_validate+0xd4/0x150 [ttm]
[] ttm_bo_init_reserved+0x2ed/0x420 [ttm]
[] amdgpu_bo_create_restricted+0x1f3/0x470 [amdgpu]
[] amdgpu_bo_create+0xda/0x220 [amdgpu]
[] amdgpu_gem_object_create+0xaa/0x140 [amdgpu]
[] amdgpu_gem_create_ioctl+0x97/0x120 [amdgpu]
[] drm_ioctl+0x1fa/0x480 [drm]
[] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
[] do_vfs_ioctl+0xa3/0x5f0
[] SyS_ioctl+0x79/0x90
[] entry_SYSCALL_64_fastpath+0x1e/0xad
[] 0x

Signed-off-by: Nicolai Hähnle 
Acked-by: Christian König 
---
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
index 54eb77cffd9b..32a99e980d78 100644
--- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
@@ -220,22 +220,27 @@ void amd_sched_entity_fini(struct amd_gpu_scheduler 
*sched,
amd_sched_entity_is_idle(entity));
amd_sched_rq_remove_entity(rq, entity);
if (r) {
struct amd_sched_job *job;
 
/* Park the kernel for a moment to make sure it isn't processing
 * our enity.
 */
kthread_park(sched->thread);
kthread_unpark(sched->thread);
-   while (kfifo_out(&entity->job_queue, &job, sizeof(job)))
+   while (kfifo_out(&entity->job_queue, &job, sizeof(job))) {
+   struct amd_sched_fence *s_fence = job->s_fence;
+   amd_sched_fence_scheduled(s_fence);
+   amd_sched_fence_finished(s_fence);
+   dma_fence_put(&s_fence->finished);
sched->ops->free_job(job);
+   }
 
}
kfifo_free(&entity->job_queue);
 }
 
 static void amd_sched_entity_wakeup(struct dma_fence *f, struct dma_fence_cb 
*cb)
 {
struct amd_sched_entity *entity =
container_of(cb, struct amd_sched_entity, cb);
entity->dependency = NULL;
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH umr 1/4] Fix wave SGPR reading

2017-09-11 Thread Nicolai Hähnle

On 11.09.2017 15:31, Tom St Denis wrote:

Hi Nicolai,

I don't get this patch, 'x' starts at 0 and goes to sgpr_size but that 
doesn't include the offset into the SGPR space right?


I mean I get the patch in umr_read_sgprs() but in print_waves() won't 
that mean you're printing out SGPRS[0..size]?


Or are you saying having the base added to the printout is confusing for 
UMD debugging since the shader you're debugging probably doesn't have 
the offsets explicitly stated?


Precisely.

It's not a huge deal, but it's slightly easier to follow along the 
register dump and shader disassembly side-by-side when they both use the 
same indexing into the register files :)


Cheers,
Nicolai




Cheers,
Tom

On 09/09/17 06:55 AM, Nicolai Hähnle wrote:

From: Nicolai Hähnle 

The hardware adds the alloc base already, no need to do it in the tool.

Signed-off-by: Nicolai Hähnle 
---
  src/app/print_waves.c | 8 
  src/lib/read_sgpr.c   | 5 +++--
  2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/app/print_waves.c b/src/app/print_waves.c
index 1efd8a1..a9aaf39 100644
--- a/src/app/print_waves.c
+++ b/src/app/print_waves.c
@@ -75,22 +75,22 @@ void umr_print_waves(struct umr_asic *asic)
  "\n",
  (unsigned)se, (unsigned)sh, (unsigned)cu, 
(unsigned)ws.hw_id.simd_id, (unsigned)ws.hw_id.wave_id,
  (unsigned long)ws.wave_status.value, (unsigned long)ws.pc_hi, 
(unsigned long)ws.pc_lo,
  (unsigned long)ws.wave_inst_dw0, (unsigned long)ws.wave_inst_dw1, 
(unsigned long)ws.exec_hi, (unsigned long)ws.exec_lo,
  (unsigned long)ws.hw_id.value, (unsigned long)ws.gpr_alloc.value, 
(unsigned long)ws.lds_alloc.value, (unsigned long)ws.trapsts.value, 
(unsigned long)ws.ib_sts.value,
  (unsigned long)ws.tba_hi, (unsigned long)ws.tba_lo, (unsigned 
long)ws.tma_hi, (unsigned long)ws.tma_lo, (unsigned long)ws.ib_dbg0, 
(unsigned long)ws.m0

  );
  if (ws.wave_status.halt)
  for (x = 0; x < ((ws.gpr_alloc.sgpr_size 
+ 1) << shift); x += 4)
  printf(">SGPRS[%u..%u] = { %08lx, 
%08lx, %08lx, %08lx }\n",
-
(unsigned)((ws.gpr_alloc.sgpr_base << shift) + x),
-
(unsigned)((ws.gpr_alloc.sgpr_base << shift) + x + 3),

+(unsigned)(x),
+(unsigned)(x + 3),
  (unsigned long)sgprs[x],
  (unsigned long)sgprs[x+1],
  (unsigned long)sgprs[x+2],
  (unsigned long)sgprs[x+3]);
  pgm_addr = (((uint64_t)ws.pc_hi << 32) | 
ws.pc_lo) - (sizeof(opcodes)/2);
  umr_read_vram(asic, ws.hw_id.vm_id, 
pgm_addr, sizeof(opcodes), opcodes);

  for (x = 0; x < sizeof(opcodes)/4; x++) {
  printf(">pgm[%lu@%llx] = %08lx\n",
  (unsigned long)ws.hw_id.vm_id,
@@ -156,22 +156,22 @@ void umr_print_waves(struct umr_asic *asic)
  Hv("GPR_ALLOC", ws.gpr_alloc.value);
  PP(gpr_alloc, vgpr_base);
  PP(gpr_alloc, vgpr_size);
  PP(gpr_alloc, sgpr_base);
  PP(gpr_alloc, sgpr_size);
  if (ws.wave_status.halt) {
  printf("\n\nSGPRS:\n");
  for (x = 0; x < ((ws.gpr_alloc.sgpr_size 
+ 1) << shift); x += 4)
  printf("\t[%4u..%4u] = { %08lx, 
%08lx, %08lx, %08lx }\n",
-
(unsigned)((ws.gpr_alloc.sgpr_base << shift) + x),
-
(unsigned)((ws.gpr_alloc.sgpr_base << shift) + x + 3),

+(unsigned)(x),
+(unsigned)(x + 3),
  (unsigned long)sgprs[x],
  (unsigned long)sgprs[x+1],
  (unsigned long)sgprs[x+2],
  (unsigned long)sgprs[x+3]);
  }
  printf("\n\nPGM_MEM:\n");
  pgm_addr = (((uint64_t)ws.pc_hi << 32) | 
ws.pc_lo) - (sizeof(opcodes)/2);
  umr_read_vram(asic, ws.hw_id.vm_id, 
pgm_addr, sizeof(opcodes), opcodes);

  for (x = 0; x < sizeof(opcodes)/4; x++) {
diff --git a/src/lib/read_sgpr.c b/src/lib/read_sgpr.c
index cceb189..427cfc5 100644
--- a/src/lib/read_sgpr.c
+++ b/src/lib/read_sgpr.c
@@ -56,27 +56,28 @@ int umr_read_sgprs(struct umr_asic *asic, struct 
u

[PATCH umr 3/4] Read VGPRs of halted waves on gfx9

2017-09-09 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Signed-off-by: Nicolai Hähnle 
---
 src/app/print_waves.c | 40 +++-
 src/lib/read_gpr.c| 30 ++
 src/umr.h |  1 +
 3 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/src/app/print_waves.c b/src/app/print_waves.c
index a9aaf39..a72d224 100644
--- a/src/app/print_waves.c
+++ b/src/app/print_waves.c
@@ -29,20 +29,22 @@
 
 #define P(x) if (col++ == 4) { col = 1; printf("\n\t"); } printf("%20s: %8u | 
", #x, (unsigned)ws.x); 
 #define X(x) if (col++ == 4) { col = 1; printf("\n\t"); } printf("%20s: %08lx 
| ", #x, (unsigned long)ws.x);
 
 #define H(x) if (col) { printf("\n"); }; col = 0; printf("\n\n%s:\n\t", x);
 #define Hv(x, y) if (col) { printf("\n"); }; col = 0; 
printf("\n\n%s[%08lx]:\n\t", x, (unsigned long)y);
 
 void umr_print_waves(struct umr_asic *asic)
 {
uint32_t x, se, sh, cu, simd, wave, sgprs[1024], shift, opcodes[8];
+   uint32_t vgprs[64 * 256];
+   uint32_t thread;
uint64_t pgm_addr;
struct umr_wave_status ws;
int first = 1, col = 0;
 
if (asic->options.halt_waves)
umr_sq_cmd_halt_waves(asic, UMR_SQ_CMD_HALT);
 
if (asic->family <= FAMILY_CIK)
shift = 3;  // on SI..CIK allocations were done in 8-dword 
blocks
else
@@ -50,24 +52,36 @@ void umr_print_waves(struct umr_asic *asic)
 
for (se = 0; se < asic->config.gfx.max_shader_engines; se++)
for (sh = 0; sh < asic->config.gfx.max_sh_per_se; sh++)
for (cu = 0; cu < asic->config.gfx.max_cu_per_sh; cu++) {
umr_get_wave_sq_info(asic, se, sh, cu, &ws);
if (ws.sq_info.busy) {
for (simd = 0; simd < 4; simd++)
for (wave = 0; wave < 10; wave++) { //both simd/wave 
are hard coded at the moment...
umr_get_wave_status(asic, se, sh, cu, simd, 
wave, &ws);
if (ws.wave_status.halt || 
ws.wave_status.valid) {
+   unsigned have_vgprs = 0;
+
// grab sgprs..
-   if (ws.wave_status.halt)
+   if (ws.wave_status.halt) {
umr_read_sgprs(asic, &ws, 
&sgprs[0]);
 
+   if (options.bitfields) {
+   have_vgprs = 1;
+   for (thread = 0; thread 
< 64; ++thread) {
+   if 
(umr_read_vgprs(asic, &ws, thread,
+   
   &vgprs[256 * thread]) < 0)
+   
have_vgprs = 0;
+   }
+   }
+   }
+
if (!options.bitfields && first) {
first = 0;
printf("SE SH CU SIMD WAVE# 
WAVE_STATUS PC_HI PC_LO INST_DW0 INST_DW1 EXEC_HI EXEC_LO HW_ID GPRALLOC 
LDSALLOC TRAPSTS IBSTS TBA_HI TBA_LO TMA_HI TMA_LO IB_DBG0 M0\n");
}
if (!options.bitfields) {
printf(
 "%u %u %u %u %u " // se/sh/cu/simd/wave
 "%08lx %08lx %08lx " // wave_status pc/hi/lo
 "%08lx %08lx %08lx %08lx " // inst0/1 exec hi/lo
 "%08lx %08lx %08lx %08lx %08lx " // HW_ID GPR/LDSALLOC TRAP/IB STS
@@ -164,20 +178,44 @@ void umr_print_waves(struct umr_asic *asic)
for (x = 0; x < 
((ws.gpr_alloc.sgpr_size + 1) << shift); x += 4)

printf("\t[%4u..%4u] = { %08lx, %08lx, %08lx, %08lx }\n",

(unsigned)(x),

(unsigned)(x + 3),

(unsigned long)sgprs[x],

(unsigned long)sgprs[x+1],

(unsigned long)sgprs[x+2],

(unsigned long)sgprs[x+3]);
}
 
+
+

[PATCH umr 0/4] gfx9: read VGPRs of halted waves

2017-09-09 Thread Nicolai Hähnle
Hi all,

it seems that reading wave VGPRs actually works now with gfx9, at least
for halted waves, and that can be pretty powerful for debugging.

Due to the volume of data it prints out, it's only enabled with -O bits.

In order for it to work properly, you'll also need the kernel patch
I just sent out (drm/amdgpu/gfx9: implement wave VGPR reading).

Please review!

Thanks,
Nicolai

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH umr 1/4] Fix wave SGPR reading

2017-09-09 Thread Nicolai Hähnle
From: Nicolai Hähnle 

The hardware adds the alloc base already, no need to do it in the tool.

Signed-off-by: Nicolai Hähnle 
---
 src/app/print_waves.c | 8 
 src/lib/read_sgpr.c   | 5 +++--
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/app/print_waves.c b/src/app/print_waves.c
index 1efd8a1..a9aaf39 100644
--- a/src/app/print_waves.c
+++ b/src/app/print_waves.c
@@ -75,22 +75,22 @@ void umr_print_waves(struct umr_asic *asic)
 "\n",
 (unsigned)se, (unsigned)sh, (unsigned)cu, (unsigned)ws.hw_id.simd_id, 
(unsigned)ws.hw_id.wave_id,
 (unsigned long)ws.wave_status.value, (unsigned long)ws.pc_hi, (unsigned 
long)ws.pc_lo,
 (unsigned long)ws.wave_inst_dw0, (unsigned long)ws.wave_inst_dw1, (unsigned 
long)ws.exec_hi, (unsigned long)ws.exec_lo,
 (unsigned long)ws.hw_id.value, (unsigned long)ws.gpr_alloc.value, (unsigned 
long)ws.lds_alloc.value, (unsigned long)ws.trapsts.value, (unsigned 
long)ws.ib_sts.value,
 (unsigned long)ws.tba_hi, (unsigned long)ws.tba_lo, (unsigned long)ws.tma_hi, 
(unsigned long)ws.tma_lo, (unsigned long)ws.ib_dbg0, (unsigned long)ws.m0
 );
if (ws.wave_status.halt)
for (x = 0; x < 
((ws.gpr_alloc.sgpr_size + 1) << shift); x += 4)

printf(">SGPRS[%u..%u] = { %08lx, %08lx, %08lx, %08lx }\n",
-   
(unsigned)((ws.gpr_alloc.sgpr_base << shift) + x),
-   
(unsigned)((ws.gpr_alloc.sgpr_base << shift) + x + 3),
+   
(unsigned)(x),
+   
(unsigned)(x + 3),

(unsigned long)sgprs[x],

(unsigned long)sgprs[x+1],

(unsigned long)sgprs[x+2],

(unsigned long)sgprs[x+3]);
 
pgm_addr = (((uint64_t)ws.pc_hi 
<< 32) | ws.pc_lo) - (sizeof(opcodes)/2);
umr_read_vram(asic, 
ws.hw_id.vm_id, pgm_addr, sizeof(opcodes), opcodes);
for (x = 0; x < 
sizeof(opcodes)/4; x++) {
printf(">pgm[%lu@%llx] 
= %08lx\n",
(unsigned 
long)ws.hw_id.vm_id,
@@ -156,22 +156,22 @@ void umr_print_waves(struct umr_asic *asic)
Hv("GPR_ALLOC", 
ws.gpr_alloc.value);
PP(gpr_alloc, vgpr_base);
PP(gpr_alloc, vgpr_size);
PP(gpr_alloc, sgpr_base);
PP(gpr_alloc, sgpr_size);
 
if (ws.wave_status.halt) {
printf("\n\nSGPRS:\n");
for (x = 0; x < 
((ws.gpr_alloc.sgpr_size + 1) << shift); x += 4)

printf("\t[%4u..%4u] = { %08lx, %08lx, %08lx, %08lx }\n",
-   
(unsigned)((ws.gpr_alloc.sgpr_base << shift) + x),
-   
(unsigned)((ws.gpr_alloc.sgpr_base << shift) + x + 3),
+   
(unsigned)(x),
+   
(unsigned)(x + 3),

(unsigned long)sgprs[x],

(unsigned long)sgprs[x+1],

(unsigned long)sgprs[x+2],

(unsigned long)sgprs[x+3]);
}
 
printf("\n\nPGM_MEM:\n");
pgm_addr = (((uint64_t)ws.pc_hi 
<< 32) | ws.pc_lo) - (sizeof(opcodes)/2);
umr_read_vram(asic, 
ws.hw_id.vm_id, pgm_addr, sizeof(opcodes), opcodes);
  

[PATCH umr 4/4] Fix the no-kernel case of wave SGPR reading

2017-09-09 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Signed-off-by: Nicolai Hähnle 
---
 src/lib/read_gpr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/lib/read_gpr.c b/src/lib/read_gpr.c
index 669a49b..e6138a9 100644
--- a/src/lib/read_gpr.c
+++ b/src/lib/read_gpr.c
@@ -68,21 +68,21 @@ int umr_read_sgprs(struct umr_asic *asic, struct 
umr_wave_status *ws, uint32_t *
((uint64_t)ws->hw_id.sh_id << 20)|
((uint64_t)ws->hw_id.cu_id << 28)|
((uint64_t)ws->hw_id.wave_id << 36)  |
((uint64_t)ws->hw_id.simd_id << 44)  |
(0ULL << 52); // thread_id
 
lseek(asic->fd.gpr, addr, SEEK_SET);
return read(asic->fd.gpr, dst, 4 * ((ws->gpr_alloc.sgpr_size + 
1) << shift));
} else {
umr_grbm_select_index(asic, ws->hw_id.se_id, ws->hw_id.sh_id, 
ws->hw_id.cu_id);
-   wave_read_regs_via_mmio(asic, ws->hw_id.simd_id, 
ws->hw_id.wave_id, 0, 0,
+   wave_read_regs_via_mmio(asic, ws->hw_id.simd_id, 
ws->hw_id.wave_id, 0, 0x200,
(ws->gpr_alloc.sgpr_size + 1) << shift, 
dst);
umr_grbm_select_index(asic, 0x, 0x, 0x);
return 0;
}
 }
 
 
 int umr_read_vgprs(struct umr_asic *asic, struct umr_wave_status *ws, uint32_t 
thread, uint32_t *dst)
 {
uint64_t addr;
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH umr 2/4] Rename lib/read_sgpr.c to lib/read_gpr.c

2017-09-09 Thread Nicolai Hähnle
From: Nicolai Hähnle 

We will implement VGPR reading, hence this is a better name.

Signed-off-by: Nicolai Hähnle 
---
 src/lib/CMakeLists.txt  | 2 +-
 src/lib/{read_sgpr.c => read_gpr.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename src/lib/{read_sgpr.c => read_gpr.c} (100%)

diff --git a/src/lib/CMakeLists.txt b/src/lib/CMakeLists.txt
index 0675b99..78d827a 100644
--- a/src/lib/CMakeLists.txt
+++ b/src/lib/CMakeLists.txt
@@ -11,21 +11,21 @@ add_library(umrcore STATIC
   create_mmio_accel.c
   discover_by_did.c
   discover_by_name.c
   discover.c
   dump_ib.c
   find_reg.c
   free_maps.c
   mmio.c
   query_drm.c
   read_sensor.c
-  read_sgpr.c
+  read_gpr.c
   read_vram.c
   ring_decode.c
   scan_config.c
   sq_cmd_halt_waves.c
   transfer_soc15.c
   wave_status.c
   update.c
   $ $
 )
 
diff --git a/src/lib/read_sgpr.c b/src/lib/read_gpr.c
similarity index 100%
rename from src/lib/read_sgpr.c
rename to src/lib/read_gpr.c
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu/gfx9: implement wave VGPR reading

2017-09-09 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This is already hooked up to the "amdgpu_gpr" debugfs file used by
the umr userspace debugging tool.

Signed-off-by: Nicolai Hähnle 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index d1c8729a3534..8956f3ab271a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -929,26 +929,36 @@ static void gfx_v9_0_read_wave_data(struct amdgpu_device 
*adev, uint32_t simd, u
 
 static void gfx_v9_0_read_wave_sgprs(struct amdgpu_device *adev, uint32_t simd,
 uint32_t wave, uint32_t start,
 uint32_t size, uint32_t *dst)
 {
wave_read_regs(
adev, simd, wave, 0,
start + SQIND_WAVE_SGPRS_OFFSET, size, dst);
 }
 
+static void gfx_v9_0_read_wave_vgprs(struct amdgpu_device *adev, uint32_t simd,
+uint32_t wave, uint32_t thread,
+uint32_t start, uint32_t size,
+uint32_t *dst)
+{
+   wave_read_regs(
+   adev, simd, wave, thread,
+   start + SQIND_WAVE_VGPRS_OFFSET, size, dst);
+}
 
 static const struct amdgpu_gfx_funcs gfx_v9_0_gfx_funcs = {
.get_gpu_clock_counter = &gfx_v9_0_get_gpu_clock_counter,
.select_se_sh = &gfx_v9_0_select_se_sh,
.read_wave_data = &gfx_v9_0_read_wave_data,
.read_wave_sgprs = &gfx_v9_0_read_wave_sgprs,
+   .read_wave_vgprs = &gfx_v9_0_read_wave_vgprs,
 };
 
 static void gfx_v9_0_gpu_early_init(struct amdgpu_device *adev)
 {
u32 gb_addr_config;
 
adev->gfx.funcs = &gfx_v9_0_gfx_funcs;
 
switch (adev->asic_type) {
case CHIP_VEGA10:
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH umr] wave_status: enable on Raven

2017-07-26 Thread Nicolai Hähnle
From: Nicolai Hähnle 

I've been using this for a while now.

Signed-off-by: Nicolai Hähnle 
---
 src/lib/wave_status.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/lib/wave_status.c b/src/lib/wave_status.c
index 6b8098e..fe2add7 100644
--- a/src/lib/wave_status.c
+++ b/src/lib/wave_status.c
@@ -310,7 +310,7 @@ int umr_get_wave_status(struct umr_asic *asic, unsigned se, 
unsigned sh, unsigne
 
 int umr_get_wave_sq_info(struct umr_asic *asic, unsigned se, unsigned sh, 
unsigned cu, struct umr_wave_status *ws)
 {
-   if (asic->family <= FAMILY_AI)
+   if (asic->family <= FAMILY_RV)
return umr_get_wave_sq_info_vi(asic, se, sh, cu, ws);
return -1;
 }
-- 
2.9.3

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu/gfx9: simplify and fix GRBM index selection

2017-07-14 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Copy the approach taken by gfx8, which simplifies the code, and set the
instance index properly. The latter is required for debugging, e.g. for
reading wave status by UMR.

Signed-off-by: Nicolai Hähnle 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 6986285..020da95 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1468,35 +1468,37 @@ static int gfx_v9_0_sw_fini(void *handle)
 }
 
 
 static void gfx_v9_0_tiling_mode_table_init(struct amdgpu_device *adev)
 {
/* TODO */
 }
 
 static void gfx_v9_0_select_se_sh(struct amdgpu_device *adev, u32 se_num, u32 
sh_num, u32 instance)
 {
-   u32 data = REG_SET_FIELD(0, GRBM_GFX_INDEX, INSTANCE_BROADCAST_WRITES, 
1);
+   u32 data;
 
-   if ((se_num == 0x) && (sh_num == 0x)) {
-   data = REG_SET_FIELD(data, GRBM_GFX_INDEX, SH_BROADCAST_WRITES, 
1);
-   data = REG_SET_FIELD(data, GRBM_GFX_INDEX, SE_BROADCAST_WRITES, 
1);
-   } else if (se_num == 0x) {
-   data = REG_SET_FIELD(data, GRBM_GFX_INDEX, SH_INDEX, sh_num);
+   if (instance == 0x)
+   data = REG_SET_FIELD(0, GRBM_GFX_INDEX, 
INSTANCE_BROADCAST_WRITES, 1);
+   else
+   data = REG_SET_FIELD(0, GRBM_GFX_INDEX, INSTANCE_INDEX, 
instance);
+
+   if (se_num == 0x)
data = REG_SET_FIELD(data, GRBM_GFX_INDEX, SE_BROADCAST_WRITES, 
1);
-   } else if (sh_num == 0x) {
-   data = REG_SET_FIELD(data, GRBM_GFX_INDEX, SH_BROADCAST_WRITES, 
1);
+   else
data = REG_SET_FIELD(data, GRBM_GFX_INDEX, SE_INDEX, se_num);
-   } else {
+
+   if (sh_num == 0x)
+   data = REG_SET_FIELD(data, GRBM_GFX_INDEX, SH_BROADCAST_WRITES, 
1);
+   else
data = REG_SET_FIELD(data, GRBM_GFX_INDEX, SH_INDEX, sh_num);
-   data = REG_SET_FIELD(data, GRBM_GFX_INDEX, SE_INDEX, se_num);
-   }
+
WREG32_SOC15(GC, 0, mmGRBM_GFX_INDEX, data);
 }
 
 static u32 gfx_v9_0_get_rb_active_bitmap(struct amdgpu_device *adev)
 {
u32 data, mask;
 
data = RREG32_SOC15(GC, 0, mmCC_RB_BACKEND_DISABLE);
data |= RREG32_SOC15(GC, 0, mmGC_USER_RB_BACKEND_DISABLE);
 
-- 
2.9.3

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH v2] drm/amd/sched: print sched job id in amd_sched_job trace

2017-06-27 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This makes it easier to correlate amd_sched_job with with other trace
points that don't log the job pointer.

v2: don't print the sched_job pointer (Andres)

Signed-off-by: Nicolai Hähnle 
Reviewed-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h 
b/drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h
index dbd4fd3a..8bd3810 100644
--- a/drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h
+++ b/drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h
@@ -9,39 +9,40 @@
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM gpu_sched
 #define TRACE_INCLUDE_FILE gpu_sched_trace
 
 TRACE_EVENT(amd_sched_job,
TP_PROTO(struct amd_sched_job *sched_job),
TP_ARGS(sched_job),
TP_STRUCT__entry(
 __field(struct amd_sched_entity *, entity)
-__field(struct amd_sched_job *, sched_job)
 __field(struct dma_fence *, fence)
 __field(const char *, name)
+__field(uint64_t, id)
 __field(u32, job_count)
 __field(int, hw_job_count)
 ),
 
TP_fast_assign(
   __entry->entity = sched_job->s_entity;
-  __entry->sched_job = sched_job;
+  __entry->id = sched_job->id;
   __entry->fence = &sched_job->s_fence->finished;
   __entry->name = sched_job->sched->name;
   __entry->job_count = kfifo_len(
   &sched_job->s_entity->job_queue) / 
sizeof(sched_job);
   __entry->hw_job_count = atomic_read(
   &sched_job->sched->hw_rq_count);
   ),
-   TP_printk("entity=%p, sched job=%p, fence=%p, ring=%s, job 
count:%u, hw job count:%d",
- __entry->entity, __entry->sched_job, __entry->fence, 
__entry->name,
+   TP_printk("entity=%p, id=%llu, fence=%p, ring=%s, job count:%u, hw 
job count:%d",
+ __entry->entity, __entry->id,
+ __entry->fence, __entry->name,
  __entry->job_count, __entry->hw_job_count)
 );
 
 TRACE_EVENT(amd_sched_process_job,
TP_PROTO(struct amd_sched_fence *fence),
TP_ARGS(fence),
TP_STRUCT__entry(
__field(struct dma_fence *, fence)
),
 
-- 
2.9.3

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: GART write flush error on SI w/ amdgpu

2017-06-20 Thread Nicolai Hähnle

On 20.06.2017 12:34, Marek Olšák wrote:

BTW, I noticed the flush sequence in the kernel is wrong. The correct
flush sequence should be:

1) EVENT_WRITE_EOP - CACHE_FLUSH_AND_INV_TS - write a dword to memory,
but no fence/interrupt.
2) WAIT_REG_MEM on the dword to wait for idle before SURFACE_SYNC.
3) SURFACE_SYNC (TC, K$, I$)
4) Write CP_COHER_CNTL2.
5) EVENT_WRITE_EOP - BOTTOM_OF_PIPE_TS - write the fence with the interrupt.

WAIT_REG_MEM wouldn't be needed if we were able to merge
CACHE_FLUSH_AND_INV, SURFACE_SYNC, and CP_COHER_CNTL2 into one EOP
event.

The main issue with the current flush sequence in radeon and amdgpu is
that it doesn't wait for idle before writing CP_COHER_CNTL2 and
SURFACE_SYNC. So far we've been able to avoid the bug by waiting for
idle in userspace IBs.


This is gfx9-only though, right?

Cheers,
Nicolai




Marek


On Fri, May 26, 2017 at 5:47 PM, Marek Olšák  wrote:

On Tue, May 9, 2017 at 2:13 PM, Nicolai Hähnle  wrote:

Hi all,

I'm seeing some very strange errors on Verde with CPU readback from GART,
and am pretty much out of ideas. Some help would be very much appreciated.

The error manifests with the
GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo test on amdgpu,
but
*not* on radeon. Here's what the test does:

1. Upload a texture.
2. Read the texture back via a shader that uses shader buffer writes to
write data to a buffer that is allocated in GART.
3. The CPU then reads from the buffer -- and sometimes gets stale data.

This sequence is repeated for many sub-tests. There are some sub-tests
where
the CPU reads stale data from the buffer, i.e. the shader writes simply
don't make it to the CPU. The tests vary superficially, e.g. the first
failing test is (almost?) always one where data is written in 16-bit words
(but there are succeeding sub-tests with 16-bit writes as well).

The bug is *not* a timing issue. Adding even a 1sec delay (sleep(1);)
between the fence wait and the return of glMapBuffer does not fix the
problem. The data must be stuck in a cache somewhere.

Since the test runs okay with the radeon module, I tried some changes
based
on comparing the IB submit between radeon and amdgpu, and based on
comparing
register settings via scans obtained from umr. Some of the things I've
tried:

- Set HDP_MISC_CNTL.FLUSH_INVALIDATE_CACHE to 1 (both radeon and
amdgpu/gfx9
set this)
- Add SURFACE_SYNC packets preceded by setting CP_COHER_CNTL2 to the vmid
(radeon does this)
- Change gfx_v6_0_ring_emit_hdp_invalidate: select ME engine instead of
PFP
(which seems more logical, and is done by gfx7+), or remove the
corresponding WRITE_DATA entirely

None of these changes helped.

What *does* help is adding an artificial wait. Specifically, I'm adding a
sequence of

- WRITE_DATA
- CACHE_FLUSH_AND_INV_TS_EVENT (BOTTOM_OF_PIPE_TS has same behavior)
- WAIT_REG_MEM

as can be seen in the attached patch. This works around the problem, but
it
makes no sense:

Adding the wait sequence *before* the SURFACE_SYNC in ring_emit_fence
works
around the problem. However(!) it does not actually cause the UMD to wait
any longer than before. Without this change, the UMD immediately sees a
signaled user fence (and never uses an ioctl to wait), and with this
change,
it *still* sees a signaled user fence.

Also, note that the way I've hacked the change, the wait sequence is only
added for the user fence emit (and I'm using a modified UMD to ensure that
there is enough memory to be used by the added wait sequence).

Adding the wait sequence *after* the SURFACE_SYNC *doesn't* work around
the
problem.

So for whatever reason, the added wait sequence *before* the SURFACE_SYNC
encourages some part of the GPU to flush the data from wherever it's
stuck,
and that's just really bizarre. There must be something really simple I'm
missing, and any pointers would be appreciated.


Have you tried this?

diff --git a/src/gallium/drivers/radeonsi/si_hw_context.c
b/src/gallium/drivers/radeonsi/si_hw_context.c
index 92c09cb..e6ac0ba 100644
--- a/src/gallium/drivers/radeonsi/si_hw_context.c
+++ b/src/gallium/drivers/radeonsi/si_hw_context.c
@@ -133,7 +133,8 @@ void si_context_gfx_flush(void *context, unsigned flags,
 SI_CONTEXT_PS_PARTIAL_FLUSH;

 /* DRM 3.1.0 doesn't flush TC for VI correctly. */
-   if (ctx->b.chip_class == VI && ctx->b.screen->info.drm_minor <= 1)
+   if ((ctx->b.chip_class == VI && ctx->b.screen->info.drm_minor <= 1)
||
+   (ctx->b.chip_class == SI && ctx->b.screen->info.drm_major == 3))
 ctx->b.flags |= SI_CONTEXT_INV_GLOBAL_L2 |
 SI_CONTEXT_INV_VMEM_L1;

One more cache flush there shouldn't hurt.

Also, Mesa uses PFP_SYNC_ME. It shouldn't be necessary, but it's worth a
try.

Marek



--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals,

[PATCH] drm/amd/sched: print sched job id in amd_sched_job trace

2017-06-13 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This makes it easier to correlate amd_sched_job with with other trace
points that don't log the job pointer.

Signed-off-by: Nicolai Hähnle 
---
 drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h 
b/drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h
index dbd4fd3a..09c4230 100644
--- a/drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h
+++ b/drivers/gpu/drm/amd/scheduler/gpu_sched_trace.h
@@ -12,36 +12,39 @@
 #define TRACE_INCLUDE_FILE gpu_sched_trace
 
 TRACE_EVENT(amd_sched_job,
TP_PROTO(struct amd_sched_job *sched_job),
TP_ARGS(sched_job),
TP_STRUCT__entry(
 __field(struct amd_sched_entity *, entity)
 __field(struct amd_sched_job *, sched_job)
 __field(struct dma_fence *, fence)
 __field(const char *, name)
+__field(uint64_t, id)
 __field(u32, job_count)
 __field(int, hw_job_count)
 ),
 
TP_fast_assign(
   __entry->entity = sched_job->s_entity;
   __entry->sched_job = sched_job;
+  __entry->id = sched_job->id;
   __entry->fence = &sched_job->s_fence->finished;
   __entry->name = sched_job->sched->name;
   __entry->job_count = kfifo_len(
   &sched_job->s_entity->job_queue) / 
sizeof(sched_job);
   __entry->hw_job_count = atomic_read(
   &sched_job->sched->hw_rq_count);
   ),
-   TP_printk("entity=%p, sched job=%p, fence=%p, ring=%s, job 
count:%u, hw job count:%d",
- __entry->entity, __entry->sched_job, __entry->fence, 
__entry->name,
+   TP_printk("entity=%p, sched job=%p, id=%llu, fence=%p, ring=%s, job 
count:%u, hw job count:%d",
+ __entry->entity, __entry->sched_job, __entry->id,
+ __entry->fence, __entry->name,
  __entry->job_count, __entry->hw_job_count)
 );
 
 TRACE_EVENT(amd_sched_process_job,
TP_PROTO(struct amd_sched_fence *fence),
TP_ARGS(fence),
TP_STRUCT__entry(
__field(struct dma_fence *, fence)
),
 
-- 
2.9.3

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu/gfx9: support the amdgpu.disable_cu option

2017-06-13 Thread Nicolai Hähnle
From: Nicolai Hähnle 

This is ported from gfx8.

Signed-off-by: Nicolai Hähnle 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 5d56126..166138b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -4409,51 +4409,71 @@ static void gfx_v9_0_set_gds_init(struct amdgpu_device 
*adev)
adev->gds.mem.cs_partition_size = 1024;
 
adev->gds.gws.gfx_partition_size = 16;
adev->gds.gws.cs_partition_size = 16;
 
adev->gds.oa.gfx_partition_size = 4;
adev->gds.oa.cs_partition_size = 4;
}
 }
 
+static void gfx_v9_0_set_user_cu_inactive_bitmap(struct amdgpu_device *adev,
+u32 bitmap)
+{
+   u32 data;
+
+   if (!bitmap)
+   return;
+
+   data = bitmap << GC_USER_SHADER_ARRAY_CONFIG__INACTIVE_CUS__SHIFT;
+   data &= GC_USER_SHADER_ARRAY_CONFIG__INACTIVE_CUS_MASK;
+
+   WREG32_SOC15(GC, 0, mmGC_USER_SHADER_ARRAY_CONFIG, data);
+}
+
 static u32 gfx_v9_0_get_cu_active_bitmap(struct amdgpu_device *adev)
 {
u32 data, mask;
 
data = RREG32_SOC15(GC, 0, mmCC_GC_SHADER_ARRAY_CONFIG);
data |= RREG32_SOC15(GC, 0, mmGC_USER_SHADER_ARRAY_CONFIG);
 
data &= CC_GC_SHADER_ARRAY_CONFIG__INACTIVE_CUS_MASK;
data >>= CC_GC_SHADER_ARRAY_CONFIG__INACTIVE_CUS__SHIFT;
 
mask = amdgpu_gfx_create_bitmask(adev->gfx.config.max_cu_per_sh);
 
return (~data) & mask;
 }
 
 static int gfx_v9_0_get_cu_info(struct amdgpu_device *adev,
 struct amdgpu_cu_info *cu_info)
 {
int i, j, k, counter, active_cu_number = 0;
u32 mask, bitmap, ao_bitmap, ao_cu_mask = 0;
+   unsigned disable_masks[4 * 2];
 
if (!adev || !cu_info)
return -EINVAL;
 
+   amdgpu_gfx_parse_disable_cu(disable_masks, 4, 2);
+
mutex_lock(&adev->grbm_idx_mutex);
for (i = 0; i < adev->gfx.config.max_shader_engines; i++) {
for (j = 0; j < adev->gfx.config.max_sh_per_se; j++) {
mask = 1;
ao_bitmap = 0;
counter = 0;
gfx_v9_0_select_se_sh(adev, i, j, 0x);
+   if (i < 4 && j < 2)
+   gfx_v9_0_set_user_cu_inactive_bitmap(
+   adev, disable_masks[i * 2 + j]);
bitmap = gfx_v9_0_get_cu_active_bitmap(adev);
cu_info->bitmap[i][j] = bitmap;
 
for (k = 0; k < adev->gfx.config.max_cu_per_sh; k ++) {
if (bitmap & mask) {
if (counter < 
adev->gfx.config.max_cu_per_sh)
ao_bitmap |= mask;
counter ++;
}
mask <<= 1;
-- 
2.9.3

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm] amdgpu: add missing extern "C" headers

2017-05-13 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Signed-off-by: Nicolai Hähnle 
---
 amdgpu/amdgpu.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index fdea905..1901fa8 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -30,20 +30,24 @@
  * User wanted to use libdrm_amdgpu functionality must include
  * this file.
  *
  */
 #ifndef _AMDGPU_H_
 #define _AMDGPU_H_
 
 #include 
 #include 
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct drm_amdgpu_info_hw_ip;
 
 /*--*/
 /* --- Defines  */
 /*--*/
 
 /**
  * Define max. number of Command Buffers (IB) which could be sent to the single
  * hardware IP to accommodate CE/DE requirements
  *
@@ -1317,11 +1321,15 @@ int amdgpu_cs_destroy_semaphore(amdgpu_semaphore_handle 
sem);
 /**
  *  Get the ASIC marketing name
  *
  * \param   dev - \c [in] Device handle. See 
#amdgpu_device_initialize()
  *
  * \return  the constant string of the marketing name
  *  "NULL" means the ASIC is not found
 */
 const char *amdgpu_get_marketing_name(amdgpu_device_handle dev);
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* #ifdef _AMDGPU_H_ */
-- 
2.9.3

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


GART write flush error on SI w/ amdgpu

2017-05-09 Thread Nicolai Hähnle

Hi all,

I'm seeing some very strange errors on Verde with CPU readback from 
GART, and am pretty much out of ideas. Some help would be very much 
appreciated.


The error manifests with the 
GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo test on amdgpu, 
but *not* on radeon. Here's what the test does:


1. Upload a texture.
2. Read the texture back via a shader that uses shader buffer writes to 
write data to a buffer that is allocated in GART.

3. The CPU then reads from the buffer -- and sometimes gets stale data.

This sequence is repeated for many sub-tests. There are some sub-tests 
where the CPU reads stale data from the buffer, i.e. the shader writes 
simply don't make it to the CPU. The tests vary superficially, e.g. the 
first failing test is (almost?) always one where data is written in 
16-bit words (but there are succeeding sub-tests with 16-bit writes as 
well).


The bug is *not* a timing issue. Adding even a 1sec delay (sleep(1);) 
between the fence wait and the return of glMapBuffer does not fix the 
problem. The data must be stuck in a cache somewhere.


Since the test runs okay with the radeon module, I tried some changes 
based on comparing the IB submit between radeon and amdgpu, and based on 
comparing register settings via scans obtained from umr. Some of the 
things I've tried:


- Set HDP_MISC_CNTL.FLUSH_INVALIDATE_CACHE to 1 (both radeon and 
amdgpu/gfx9 set this)
- Add SURFACE_SYNC packets preceded by setting CP_COHER_CNTL2 to the 
vmid (radeon does this)
- Change gfx_v6_0_ring_emit_hdp_invalidate: select ME engine instead of 
PFP (which seems more logical, and is done by gfx7+), or remove the 
corresponding WRITE_DATA entirely


None of these changes helped.

What *does* help is adding an artificial wait. Specifically, I'm adding 
a sequence of


- WRITE_DATA
- CACHE_FLUSH_AND_INV_TS_EVENT (BOTTOM_OF_PIPE_TS has same behavior)
- WAIT_REG_MEM

as can be seen in the attached patch. This works around the problem, but 
it makes no sense:


Adding the wait sequence *before* the SURFACE_SYNC in ring_emit_fence 
works around the problem. However(!) it does not actually cause the UMD 
to wait any longer than before. Without this change, the UMD immediately 
sees a signaled user fence (and never uses an ioctl to wait), and with 
this change, it *still* sees a signaled user fence.


Also, note that the way I've hacked the change, the wait sequence is 
only added for the user fence emit (and I'm using a modified UMD to 
ensure that there is enough memory to be used by the added wait sequence).


Adding the wait sequence *after* the SURFACE_SYNC *doesn't* work around 
the problem.


So for whatever reason, the added wait sequence *before* the 
SURFACE_SYNC encourages some part of the GPU to flush the data from 
wherever it's stuck, and that's just really bizarre. There must be 
something really simple I'm missing, and any pointers would be appreciated.


Thanks,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 6e20536..c1715bf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -226,20 +226,23 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
 
 	/* wrap the last IB with fence */
 	if (job && job->uf_addr) {
 		amdgpu_ring_emit_fence(ring, job->vm_id, job->uf_addr, job->uf_sequence,
    AMDGPU_FENCE_FLAG_64BIT);
 	}
 
 	if (patch_offset != ~0 && ring->funcs->patch_cond_exec)
 		amdgpu_ring_patch_cond_exec(ring, patch_offset);
 
+// 	if (job && ring->funcs->emit_hack)
+// 		ring->funcs->emit_hack(ring, job->vm_id, job->uf_addr + 8);
+
 	ring->current_ctx = fence_ctx;
 	if (vm && ring->funcs->emit_switch_buffer)
 		amdgpu_ring_emit_switch_buffer(ring);
 	amdgpu_ring_commit(ring);
 	return 0;
 }
 
 /**
  * amdgpu_ib_pool_init - Init the IB (Indirect Buffer) pool
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
index 5f10aa6..2c37e9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
@@ -1867,37 +1867,89 @@ static void gfx_v6_0_ring_emit_vgt_flush(struct amdgpu_ring *ring)
 static void gfx_v6_0_ring_emit_hdp_invalidate(struct amdgpu_ring *ring)
 {
 	amdgpu_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, 3));
 	amdgpu_ring_write(ring, (WRITE_DATA_ENGINE_SEL(0) | /* engine = 0? */
  WRITE_DATA_DST_SEL(0)));
 	amdgpu_ring_write(ring, mmHDP_DEBUG0);
 	amdgpu_ring_write(ring, 0);
 	amdgpu_ring_write(ring, 0x1);
 }
 
+static void gfx_v6_0_ring_emit_hack(struct amdgpu_ring *ring, unsigned vm_id, uint64_t addr)
+{
+	bool write64bit = false;
+	bool int_sel = false;
+
+	amdgpu_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, 3));
+	amdgpu_ring_write(ring, (WRITE_DATA_ENGINE_SEL(0) |
+ (1 << 20) | /* write confirm */
+ WRITE_DATA_DST_SEL(1)));
+	amdgpu_ring_write(ring, addr & 0xfffc);
+	a

Re: [PATCH] drm/amdgpu/gfx6: flush caches after IB with the correct vmid

2017-05-08 Thread Nicolai Hähnle
Unfortunately, further testing shows that this doesn't actually fix the 
problem. FWIW, that test runs very reliably on SI with the radeon drm, 
but with the amdgpu drm it fails. VI is fine on amdgpu, which is why I 
was sent down this road.


Anyway, back to trying to figure this out :/

Cheers,
Nicolai

On 08.05.2017 11:30, Nicolai Hähnle wrote:

From: Nicolai Hähnle 

Bring the code in line with what the radeon module does.

Without this change, the fence following the IB may be signalled
to the CPU even though some data written by shaders may not have
been written back yet.

This change fixes the OpenGL CTS test
GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo on Verde.

Signed-off-by: Nicolai Hähnle 
--
So, I'm not too familiar with the details of these syncs, but the radeon
module effectively does:

- IB
- SURFACE_SYNC (vm_id of IB)
- SURFACE_SYNC (vm_id == 0)
- EVENT_WRITE_EOP (kernel fence)

While amdgpu now does (with this change):

- IB
- SURFACE_SYNC (vm_id of IB) <-- this was missing
- SURFACE_SYNC (vm_id == 0)
- EVENT_WRITE_EOP (kernel fence)
- SURFACE_SYNC (vm_id == 0)
- EVENT_WRITE_EOP (user fence)

It seems like at least the second SURFACE_SYNC (vm_id == 0) should be
redundant, so the question is whether the SURFACE_SYNC (vm_id == 0)
should be rearranged somehow, perhaps also added to the IB emission.
But for better bisectability, that should probably be a separate
change.

Cheers,
Nicolai
---
 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
index 03d2a0a..c4f444d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
@@ -1922,20 +1922,35 @@ static void gfx_v6_0_ring_emit_ib(struct amdgpu_ring 
*ring,
control |= ib->length_dw | (vm_id << 24);

amdgpu_ring_write(ring, header);
amdgpu_ring_write(ring,
 #ifdef __BIG_ENDIAN
  (2 << 0) |
 #endif
  (ib->gpu_addr & 0xFFFC));
amdgpu_ring_write(ring, upper_32_bits(ib->gpu_addr) & 0x);
amdgpu_ring_write(ring, control);
+
+   if (!(ib->flags & AMDGPU_IB_FLAG_CE)) {
+   /* flush read cache over gart for this vmid */
+   amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
+   amdgpu_ring_write(ring, (mmCP_COHER_CNTL2 - 
PACKET3_SET_CONFIG_REG_START));
+   amdgpu_ring_write(ring, vm_id);
+   amdgpu_ring_write(ring, PACKET3(PACKET3_SURFACE_SYNC, 3));
+   amdgpu_ring_write(ring, PACKET3_TCL1_ACTION_ENA |
+ PACKET3_TC_ACTION_ENA |
+ PACKET3_SH_KCACHE_ACTION_ENA |
+ PACKET3_SH_ICACHE_ACTION_ENA);
+   amdgpu_ring_write(ring, 0x);
+   amdgpu_ring_write(ring, 0);
+   amdgpu_ring_write(ring, 10); /* poll interval */
+   }
 }

 /**
  * gfx_v6_0_ring_test_ib - basic ring IB test
  *
  * @ring: amdgpu_ring structure holding ring information
  *
  * Allocate an IB and execute it on the gfx ring (SI).
  * Provides a basic gfx ring test to verify that IBs are working.
  * Returns 0 on success, error on failure.
@@ -3631,21 +3646,21 @@ static const struct amdgpu_ring_funcs 
gfx_v6_0_ring_funcs_gfx = {
.get_rptr = gfx_v6_0_ring_get_rptr,
.get_wptr = gfx_v6_0_ring_get_wptr,
.set_wptr = gfx_v6_0_ring_set_wptr_gfx,
.emit_frame_size =
5 + /* gfx_v6_0_ring_emit_hdp_flush */
5 + /* gfx_v6_0_ring_emit_hdp_invalidate */
14 + 14 + 14 + /* gfx_v6_0_ring_emit_fence x3 for user fence, 
vm fence */
7 + 4 + /* gfx_v6_0_ring_emit_pipeline_sync */
17 + 6 + /* gfx_v6_0_ring_emit_vm_flush */
3 + 2, /* gfx_v6_ring_emit_cntxcntl including vgt flush */
-   .emit_ib_size = 6, /* gfx_v6_0_ring_emit_ib */
+   .emit_ib_size = 6 + 8, /* gfx_v6_0_ring_emit_ib */
.emit_ib = gfx_v6_0_ring_emit_ib,
.emit_fence = gfx_v6_0_ring_emit_fence,
.emit_pipeline_sync = gfx_v6_0_ring_emit_pipeline_sync,
.emit_vm_flush = gfx_v6_0_ring_emit_vm_flush,
.emit_hdp_flush = gfx_v6_0_ring_emit_hdp_flush,
.emit_hdp_invalidate = gfx_v6_0_ring_emit_hdp_invalidate,
.test_ring = gfx_v6_0_ring_test_ring,
.test_ib = gfx_v6_0_ring_test_ib,
.insert_nop = amdgpu_ring_insert_nop,
.emit_cntxcntl = gfx_v6_ring_emit_cntxcntl,
@@ -3657,21 +3672,21 @@ static const struct amdgpu_ring_funcs 
gfx_v6_0_ring_funcs_compute = {
.nop = 0x8000,
.get_rptr = gfx_v6_0_ring_get_rptr,
.get_wptr = gfx_v6_0_ring_get_wptr,
.set_wptr = gfx_v6_0_ring_set_wptr_compute,
.emit_frame_size =
5 + /* 

[PATCH] drm/amdgpu/gfx6: flush caches after IB with the correct vmid

2017-05-08 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Bring the code in line with what the radeon module does.

Without this change, the fence following the IB may be signalled
to the CPU even though some data written by shaders may not have
been written back yet.

This change fixes the OpenGL CTS test
GL45-CTS.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo on Verde.

Signed-off-by: Nicolai Hähnle 
--
So, I'm not too familiar with the details of these syncs, but the radeon
module effectively does:

- IB
- SURFACE_SYNC (vm_id of IB)
- SURFACE_SYNC (vm_id == 0)
- EVENT_WRITE_EOP (kernel fence)

While amdgpu now does (with this change):

- IB
- SURFACE_SYNC (vm_id of IB) <-- this was missing
- SURFACE_SYNC (vm_id == 0)
- EVENT_WRITE_EOP (kernel fence)
- SURFACE_SYNC (vm_id == 0)
- EVENT_WRITE_EOP (user fence)

It seems like at least the second SURFACE_SYNC (vm_id == 0) should be
redundant, so the question is whether the SURFACE_SYNC (vm_id == 0)
should be rearranged somehow, perhaps also added to the IB emission.
But for better bisectability, that should probably be a separate
change.

Cheers,
Nicolai
---
 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
index 03d2a0a..c4f444d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
@@ -1922,20 +1922,35 @@ static void gfx_v6_0_ring_emit_ib(struct amdgpu_ring 
*ring,
control |= ib->length_dw | (vm_id << 24);
 
amdgpu_ring_write(ring, header);
amdgpu_ring_write(ring,
 #ifdef __BIG_ENDIAN
  (2 << 0) |
 #endif
  (ib->gpu_addr & 0xFFFC));
amdgpu_ring_write(ring, upper_32_bits(ib->gpu_addr) & 0x);
amdgpu_ring_write(ring, control);
+
+   if (!(ib->flags & AMDGPU_IB_FLAG_CE)) {
+   /* flush read cache over gart for this vmid */
+   amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
+   amdgpu_ring_write(ring, (mmCP_COHER_CNTL2 - 
PACKET3_SET_CONFIG_REG_START));
+   amdgpu_ring_write(ring, vm_id);
+   amdgpu_ring_write(ring, PACKET3(PACKET3_SURFACE_SYNC, 3));
+   amdgpu_ring_write(ring, PACKET3_TCL1_ACTION_ENA |
+ PACKET3_TC_ACTION_ENA |
+ PACKET3_SH_KCACHE_ACTION_ENA |
+ PACKET3_SH_ICACHE_ACTION_ENA);
+   amdgpu_ring_write(ring, 0x);
+   amdgpu_ring_write(ring, 0);
+   amdgpu_ring_write(ring, 10); /* poll interval */
+   }
 }
 
 /**
  * gfx_v6_0_ring_test_ib - basic ring IB test
  *
  * @ring: amdgpu_ring structure holding ring information
  *
  * Allocate an IB and execute it on the gfx ring (SI).
  * Provides a basic gfx ring test to verify that IBs are working.
  * Returns 0 on success, error on failure.
@@ -3631,21 +3646,21 @@ static const struct amdgpu_ring_funcs 
gfx_v6_0_ring_funcs_gfx = {
.get_rptr = gfx_v6_0_ring_get_rptr,
.get_wptr = gfx_v6_0_ring_get_wptr,
.set_wptr = gfx_v6_0_ring_set_wptr_gfx,
.emit_frame_size =
5 + /* gfx_v6_0_ring_emit_hdp_flush */
5 + /* gfx_v6_0_ring_emit_hdp_invalidate */
14 + 14 + 14 + /* gfx_v6_0_ring_emit_fence x3 for user fence, 
vm fence */
7 + 4 + /* gfx_v6_0_ring_emit_pipeline_sync */
17 + 6 + /* gfx_v6_0_ring_emit_vm_flush */
3 + 2, /* gfx_v6_ring_emit_cntxcntl including vgt flush */
-   .emit_ib_size = 6, /* gfx_v6_0_ring_emit_ib */
+   .emit_ib_size = 6 + 8, /* gfx_v6_0_ring_emit_ib */
.emit_ib = gfx_v6_0_ring_emit_ib,
.emit_fence = gfx_v6_0_ring_emit_fence,
.emit_pipeline_sync = gfx_v6_0_ring_emit_pipeline_sync,
.emit_vm_flush = gfx_v6_0_ring_emit_vm_flush,
.emit_hdp_flush = gfx_v6_0_ring_emit_hdp_flush,
.emit_hdp_invalidate = gfx_v6_0_ring_emit_hdp_invalidate,
.test_ring = gfx_v6_0_ring_test_ring,
.test_ib = gfx_v6_0_ring_test_ib,
.insert_nop = amdgpu_ring_insert_nop,
.emit_cntxcntl = gfx_v6_ring_emit_cntxcntl,
@@ -3657,21 +3672,21 @@ static const struct amdgpu_ring_funcs 
gfx_v6_0_ring_funcs_compute = {
.nop = 0x8000,
.get_rptr = gfx_v6_0_ring_get_rptr,
.get_wptr = gfx_v6_0_ring_get_wptr,
.set_wptr = gfx_v6_0_ring_set_wptr_compute,
.emit_frame_size =
5 + /* gfx_v6_0_ring_emit_hdp_flush */
5 + /* gfx_v6_0_ring_emit_hdp_invalidate */
7 + /* gfx_v6_0_ring_emit_pipeline_sync */
17 + /* gfx_v6_0_ring_emit_vm_flush */
14 + 14 + 14, /* gfx_v6_0_ring_emit_fence x3 for user fence, vm 
fence */
-   .emit_ib_size = 6, /* gfx_v6_0_ring_emit_ib */
+   .em

Re: FW: [PATCH 1/2] drm/amdgpu:use FRAME_CNTL for new GFX ucode

2017-05-04 Thread Nicolai Hähnle

On 04.05.2017 13:43, Liu, Monk wrote:

And for VI, even user use the old firmware, it is still safe to use TMZ package,

But I will double check with CP team today, thanks for your remind


Right, I'm sure that old firmware + your change is safe.

The question is: Is it safe to use the new firmware _without_ your 
change. If the answer is "no" for gfx8, we have a problem.


Cheers,
Nicolai




BR Monk

-Original Message-
From: Liu, Monk
Sent: Thursday, May 04, 2017 7:42 PM
To: 'Nicolai Hähnle' ; amd-gfx@lists.freedesktop.org
Subject: RE: FW: [PATCH 1/2] drm/amdgpu:use FRAME_CNTL for new GFX ucode

No, this CP side change is from VI, and ported to AI


-Original Message-
From: Nicolai Hähnle [mailto:nhaeh...@gmail.com]
Sent: Thursday, May 04, 2017 7:33 PM
To: Liu, Monk ; amd-gfx@lists.freedesktop.org
Subject: Re: FW: [PATCH 1/2] drm/amdgpu:use FRAME_CNTL for new GFX ucode

On 04.05.2017 12:48, Liu, Monk wrote:



-Original Message-
From: Monk Liu [mailto:monk@amd.com]
Sent: Thursday, May 04, 2017 6:47 PM
To: Liu, Monk 
Cc: Liu, Monk 
Subject: [PATCH 1/2] drm/amdgpu:use FRAME_CNTL for new GFX ucode

VI/AI affected:


I thought this was a gfx9-only change? If it's gfx8 also, we're going to have 
pretty bad compatibility issues when new firmware is used with old kernel...

Cheers,
Nicolai




CP/HW team requires KMD insert FRAME_CONTROL(end) after the last IB and before 
the fence of this DMAframe.

this is to make sure the cache are flushed, and it's a must change no matter 
MCBP/SR-IOV or bare-metal case because new CP hw won't do the cache flush for 
each IB anymore, it just leaves it to KMD now.

with this patch, certain MCBP hang issue when rendering vulkan/chained-ib are 
resolved.

Change-Id: I34ee7528aa32e704b2850bc6d50774b24c29b840
Signed-off-by: Monk Liu 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h  | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   | 3 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c| 7 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 7 +++
 5 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 0ee4d87..f59a1e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1828,6 +1828,7 @@ amdgpu_get_sdma_instance(struct amdgpu_ring
*ring)  #define amdgpu_ring_emit_cntxcntl(r, d)
(r)->funcs->emit_cntxcntl((r), (d))  #define amdgpu_ring_emit_rreg(r,
d) (r)->funcs->emit_rreg((r), (d))  #define amdgpu_ring_emit_wreg(r,
d, v) (r)->funcs->emit_wreg((r), (d), (v))
+#define amdgpu_ring_emit_tmz(r, b) (r)->funcs->emit_tmz((r), (b))
 #define amdgpu_ring_pad_ib(r, ib) ((r)->funcs->pad_ib((r), (ib)))
#define amdgpu_ring_init_cond_exec(r) (r)->funcs->init_cond_exec((r))
#define amdgpu_ring_patch_cond_exec(r,o)
(r)->funcs->patch_cond_exec((r),(o))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 4480e01..11a22fa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -206,6 +206,9 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
need_ctx_switch = false;
}

+   if (ring->funcs->emit_tmz)
+   amdgpu_ring_emit_tmz(ring, false);
+
if (ring->funcs->emit_hdp_invalidate  #ifdef CONFIG_X86_64
&& !(adev->flags & AMD_IS_APU)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 5786cc3..981ef08 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -142,6 +142,7 @@ struct amdgpu_ring_funcs {
void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg);
void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t
val);
+   void (*emit_tmz)(struct amdgpu_ring *ring, bool start);
 };

 struct amdgpu_ring {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 4144fc3..90998f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -6665,6 +6665,12 @@ static void gfx_v8_0_ring_emit_patch_cond_exec(struct 
amdgpu_ring *ring, unsigne
ring->ring[offset] = (ring->ring_size >> 2) - offset + cur;  }

+static void gfx_v8_0_ring_emit_tmz(struct amdgpu_ring *ring, bool
+start) {
+   amdgpu_ring_write(ring, PACKET3(PACKET3_FRAME_CONTROL, 0));
+   amdgpu_ring_write(ring, FRAME_CMD(start ? 0 : 1)); /* frame_end */ }
+

 static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg)  { 
@@ -6946,6 +6952,7 @@ static const struct amdgpu_ring_funcs 
gfx_v8_0_ring_funcs_gfx = {
.em

Re: FW: [PATCH 1/2] drm/amdgpu:use FRAME_CNTL for new GFX ucode

2017-05-04 Thread Nicolai Hähnle

On 04.05.2017 12:48, Liu, Monk wrote:



-Original Message-
From: Monk Liu [mailto:monk@amd.com]
Sent: Thursday, May 04, 2017 6:47 PM
To: Liu, Monk 
Cc: Liu, Monk 
Subject: [PATCH 1/2] drm/amdgpu:use FRAME_CNTL for new GFX ucode

VI/AI affected:


I thought this was a gfx9-only change? If it's gfx8 also, we're going to 
have pretty bad compatibility issues when new firmware is used with old 
kernel...


Cheers,
Nicolai




CP/HW team requires KMD insert FRAME_CONTROL(end) after the last IB and before 
the fence of this DMAframe.

this is to make sure the cache are flushed, and it's a must change no matter 
MCBP/SR-IOV or bare-metal case because new CP hw won't do the cache flush for 
each IB anymore, it just leaves it to KMD now.

with this patch, certain MCBP hang issue when rendering vulkan/chained-ib are 
resolved.

Change-Id: I34ee7528aa32e704b2850bc6d50774b24c29b840
Signed-off-by: Monk Liu 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h  | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   | 3 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c| 7 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 7 +++
 5 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 0ee4d87..f59a1e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1828,6 +1828,7 @@ amdgpu_get_sdma_instance(struct amdgpu_ring *ring)  #define 
amdgpu_ring_emit_cntxcntl(r, d) (r)->funcs->emit_cntxcntl((r), (d))  #define 
amdgpu_ring_emit_rreg(r, d) (r)->funcs->emit_rreg((r), (d))  #define 
amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), (v))
+#define amdgpu_ring_emit_tmz(r, b) (r)->funcs->emit_tmz((r), (b))
 #define amdgpu_ring_pad_ib(r, ib) ((r)->funcs->pad_ib((r), (ib)))  #define 
amdgpu_ring_init_cond_exec(r) (r)->funcs->init_cond_exec((r))  #define 
amdgpu_ring_patch_cond_exec(r,o) (r)->funcs->patch_cond_exec((r),(o))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 4480e01..11a22fa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -206,6 +206,9 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
need_ctx_switch = false;
}

+   if (ring->funcs->emit_tmz)
+   amdgpu_ring_emit_tmz(ring, false);
+
if (ring->funcs->emit_hdp_invalidate
 #ifdef CONFIG_X86_64
&& !(adev->flags & AMD_IS_APU)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 5786cc3..981ef08 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -142,6 +142,7 @@ struct amdgpu_ring_funcs {
void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t flags);
void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg);
void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, uint32_t val);
+   void (*emit_tmz)(struct amdgpu_ring *ring, bool start);
 };

 struct amdgpu_ring {
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 4144fc3..90998f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -6665,6 +6665,12 @@ static void gfx_v8_0_ring_emit_patch_cond_exec(struct 
amdgpu_ring *ring, unsigne
ring->ring[offset] = (ring->ring_size >> 2) - offset + cur;  }

+static void gfx_v8_0_ring_emit_tmz(struct amdgpu_ring *ring, bool
+start) {
+   amdgpu_ring_write(ring, PACKET3(PACKET3_FRAME_CONTROL, 0));
+   amdgpu_ring_write(ring, FRAME_CMD(start ? 0 : 1)); /* frame_end */ }
+

 static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg)  { 
@@ -6946,6 +6952,7 @@ static const struct amdgpu_ring_funcs 
gfx_v8_0_ring_funcs_gfx = {
.emit_cntxcntl = gfx_v8_ring_emit_cntxcntl,
.init_cond_exec = gfx_v8_0_ring_emit_init_cond_exec,
.patch_cond_exec = gfx_v8_0_ring_emit_patch_cond_exec,
+   .emit_tmz = gfx_v8_0_ring_emit_tmz,
 };

 static const struct amdgpu_ring_funcs gfx_v8_0_ring_funcs_compute = { diff 
--git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 3bf7992..a9ca891 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -3245,6 +3245,12 @@ static void gfx_v9_0_ring_emit_patch_cond_exec(struct 
amdgpu_ring *ring, unsigne
ring->ring[offset] = (ring->ring_size>>2) - offset + cur;  }

+static void gfx_v9_0_ring_emit_tmz(struct amdgpu_ring *ring, bool
+start) {
+   amdgpu_ring_write(ring, PACKET3(PACKET3_FRAME_CONTROL, 0));
+   amdgpu_ring_write(ring, FRAME_CMD(start ? 0 : 1)); /* frame_end */ }
+
 static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg)  {
struct amdgpu_device *adev = ring->adev;

Re: [PATCH libdrm 1/1] amdgpu: update marketing names

2017-05-04 Thread Nicolai Hähnle

On 04.05.2017 00:39, Samuel Li wrote:

Change-Id: Ia0aff9dba5889f0c9006923236da1b2adc907f1a
Signed-off-by: Samuel Li 
---
 amdgpu/amdgpu_asic_id.h | 146 +---
 1 file changed, 87 insertions(+), 59 deletions(-)

diff --git a/amdgpu/amdgpu_asic_id.h b/amdgpu/amdgpu_asic_id.h
index 3e7d736..6237193 100644
--- a/amdgpu/amdgpu_asic_id.h
+++ b/amdgpu/amdgpu_asic_id.h
@@ -31,99 +31,124 @@ static struct amdgpu_asic_id_table_t {
const char *marketing_name;
 } const amdgpu_asic_id_table [] = {
{0x6600,0x0,"AMD Radeon HD 8600/8700M"},
-   {0x6600,0x81,   "AMD Radeon R7 M370"},
-   {0x6601,0x0,"AMD Radeon HD 8500M/8700M"},
+   {0x6600,0x81,   "AMD Radeon (TM) R7 M370"},
+   {0x6601,0x0,"AMD Radeon (TM) HD 8500M/8700M"},
{0x6604,0x0,"AMD Radeon R7 M265 Series"},
-   {0x6604,0x81,   "AMD Radeon R7 M350"},
+   {0x6604,0x81,   "AMD Radeon (TM) R7 M350"},
{0x6605,0x0,"AMD Radeon R7 M260 Series"},
-   {0x6605,0x81,   "AMD Radeon R7 M340"},
+   {0x6605,0x81,   "AMD Radeon (TM) R7 M340"},
{0x6606,0x0,"AMD Radeon HD 8790M"},
-   {0x6607,0x0,"AMD Radeon HD8530M"},
+   {0x6607,0x0,"AMD Radeon (TM) HD8530M"},
{0x6608,0x0,"AMD FirePro W2100"},
{0x6610,0x0,"AMD Radeon HD 8600 Series"},
-   {0x6610,0x81,   "AMD Radeon R7 350"},
-   {0x6610,0x83,   "AMD Radeon R5 340"},
+   {0x6610,0x81,   "AMD Radeon (TM) R7 350"},
+   {0x6610,0x83,   "AMD Radeon (TM) R5 340"},
{0x6611,0x0,"AMD Radeon HD 8500 Series"},
{0x6613,0x0,"AMD Radeon HD 8500 series"},
{0x6617,0xC7,   "AMD Radeon R7 240 Series"},
{0x6640,0x0,"AMD Radeon HD 8950"},
-   {0x6640,0x80,   "AMD Radeon R9 M380"},
+   {0x6640,0x80,   "AMD Radeon (TM) R9 M380"},
{0x6646,0x0,"AMD Radeon R9 M280X"},
-   {0x6646,0x80,   "AMD Radeon R9 M470X"},
+   {0x6646,0x80,   "AMD Radeon (TM) R9 M470X"},
{0x6647,0x0,"AMD Radeon R9 M270X"},
-   {0x6647,0x80,   "AMD Radeon R9 M380"},
+   {0x6647,0x80,   "AMD Radeon (TM) R9 M380"},
{0x6649,0x0,"AMD FirePro W5100"},
{0x6658,0x0,"AMD Radeon R7 200 Series"},
{0x665C,0x0,"AMD Radeon HD 7700 Series"},
{0x665D,0x0,"AMD Radeon R7 200 Series"},
-   {0x665F,0x81,   "AMD Radeon R7 300 Series"},
+   {0x665F,0x81,   "AMD Radeon (TM) R7 300 Series"},
{0x6660,0x0,"AMD Radeon HD 8600M Series"},
-   {0x6660,0x81,   "AMD Radeon R5 M335"},
-   {0x6660,0x83,   "AMD Radeon R5 M330"},
+   {0x6660,0x81,   "AMD Radeon (TM) R5 M335"},
+   {0x6660,0x83,   "AMD Radeon (TM) R5 M430"},
{0x6663,0x0,"AMD Radeon HD 8500M Series"},
-   {0x6663,0x83,   "AMD Radeon R5 M320"},
+   {0x6663,0x83,   "AMD Radeon (TM) R5 M320"},
{0x6664,0x0,"AMD Radeon R5 M200 Series"},
{0x6665,0x0,"AMD Radeon R5 M200 Series"},
-   {0x6665,0x83,   "AMD Radeon R5 M320"},
+   {0x6665,0x83,   "AMD Radeon (TM) R5 M320"},
+   {0x6665,0xC3,   "AMD Radeon (TM) R5 M430"},
{0x6667,0x0,"AMD Radeon R5 M200 Series"},
-   {0x666F,0x0,"AMD Radeon HD 8500M"},
+   {0x666F,0x0,"AMD Radeon (TM) R5 M420"},
{0x6780,0x0,"ATI FirePro V (FireGL V) Graphics Adapter"},
{0x678A,0x0,"ATI FirePro V (FireGL V) Graphics Adapter"},
{0x6798,0x0,"AMD Radeon HD 7900 Series"},
{0x679A,0x0,"AMD Radeon HD 7900 Series"},
{0x679B,0x0,"AMD Radeon HD 7900 Series"},
{0x679E,0x0,"AMD Radeon HD 7800 Series"},
-   {0x67A0,0x0,"HAWAII XTGL (67A0)"},
-   {0x67A1,0x0,"HAWAII GL40 (67A1)"},
+   {0x67A0,0x0,"AMD Radeon FirePro W9100"},
+   {0x67A1,0x0,"AMD Radeon FirePro W8100"},
{0x67B0,0x0,"AMD Radeon R9 200 Series"},
-   {0x67B0,0x80,   "AMD Radeon R9 390 Series"},
+   {0x67B0,0x80,   "AMD Radeon (TM) R9 390 Series"},
{0x67B1,0x0,"AMD Radeon R9 200 Series"},
-   {0x67B1,0x80,   "AMD Radeon R9 390 Series"},
+   {0x67B1,0x80,   "AMD Radeon (TM) R9 390 Series"},
{0x67B9,0x0,"AMD Radeon R9 200 Series"},
-   {0x67DF,0xC4,   "AMD Radeon RX 480 Graphics"},
-   {0x67DF,0xC5,   "AMD Radeon RX 470 Graphics"},
-   {0x67DF,0xC7,   "AMD Radeon RX 480 Graphics"},
-   {0x6

Re: [PATCH] drm/amdgpu/gfx: drop max_gs_waves_per_vgt

2017-05-03 Thread Nicolai Hähnle

On 02.05.2017 21:50, Alex Deucher wrote:

We already have this info: max_gs_threads.  Drop the duplicate.


max_gs_waves_per_vgt seems to be the better name for this number though. 
Threads is usually what we call an item, of which each wave has 64.


Cheers,
Nicolai



Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 1 -
 3 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 0ee4d87..e7fe649 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -972,7 +972,6 @@ struct amdgpu_gfx_config {
unsigned num_rbs;
unsigned gs_vgt_table_depth;
unsigned gs_prim_buffer_depth;
-   unsigned max_gs_waves_per_vgt;

uint32_t tile_mode_array[32];
uint32_t macrotile_mode_array[16];
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index d40b8ac..8d7e4d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -560,7 +560,7 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file
dev_info.num_tcc_blocks = 
adev->gfx.config.max_texture_channel_caches;
dev_info.gs_vgt_table_depth = 
adev->gfx.config.gs_vgt_table_depth;
dev_info.gs_prim_buffer_depth = 
adev->gfx.config.gs_prim_buffer_depth;
-   dev_info.max_gs_waves_per_vgt = 
adev->gfx.config.max_gs_waves_per_vgt;
+   dev_info.max_gs_waves_per_vgt = adev->gfx.config.max_gs_threads;

return copy_to_user(out, &dev_info,
min((size_t)size, sizeof(dev_info))) ? 
-EFAULT : 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 2b2a2c2..8b281df 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -796,7 +796,6 @@ static void gfx_v9_0_gpu_early_init(struct amdgpu_device 
*adev)
adev->gfx.config.sc_earlyz_tile_fifo_size = 0x4C0;
adev->gfx.config.gs_vgt_table_depth = 32;
adev->gfx.config.gs_prim_buffer_depth = 1792;
-   adev->gfx.config.max_gs_waves_per_vgt = 32;
gb_addr_config = VEGA10_GB_ADDR_CONFIG_GOLDEN;
break;
default:




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: add parameter to allocate high priority contexts v9

2017-04-29 Thread Nicolai Hähnle

On 29.04.2017 18:30, Andres Rodriguez wrote:



On 2017-04-29 04:34 AM, Nicolai Hähnle wrote:

Thanks for the update!


On 26.04.2017 03:10, Andres Rodriguez wrote:

Add a new context creation parameter to express a global context
priority.

The priority ranking in descending order is as follows:
 * AMDGPU_CTX_PRIOR ITY_HIGH
 * AMDGPU_CTX_PRIORITY_NORMAL
 * AMDGPU_CTX_PRIORITY_LOW

The driver will attempt to schedule work to the hardware according to
the priorities. No latency or throughput guarantees are provided by
this patch.

This interface intends to service the EGL_IMG_context_priority
extension, and vulkan equivalents.

v2: Instead of using flags, repurpose __pad
v3: Swap enum values of _NORMAL _HIGH for backwards compatibility
v4: Validate usermode priority and store it
v5: Move priority validation into amdgpu_ctx_ioctl(), headline reword
v6: add UAPI note regarding priorities requiring CAP_SYS_ADMIN
v7: remove ctx->priority
v8: added AMDGPU_CTX_PRIORITY_LOW, s/CAP_SYS_ADMIN/CAP_SYS_NICE
v9: change the priority parameter to __s32

Reviewed-by: Emil Velikov 
Reviewed-by: Christian König 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 38
---
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.h |  4 ++-
 include/uapi/drm/amdgpu_drm.h |  8 +-
 3 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index b43..af75571 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -25,11 +25,19 @@
 #include 
 #include "amdgpu.h"

-static int amdgpu_ctx_init(struct amdgpu_device *adev, struct
amdgpu_ctx *ctx)
+static int amdgpu_ctx_init(struct amdgpu_device *adev,
+   enum amd_sched_priority priority,
+   struct amdgpu_ctx *ctx)
 {
 unsigned i, j;
 int r;

+if (priority < 0 || priority >= AMD_SCHED_PRIORITY_MAX)
+return -EINVAL;
+
+if (priority >= AMD_SCHED_PRIORITY_HIGH && !capable(CAP_SYS_NICE))
+return -EACCES;
+
 memset(ctx, 0, sizeof(*ctx));
 ctx->adev = adev;
 kref_init(&ctx->refcount);
@@ -51,7 +59,7 @@ static int amdgpu_ctx_init(struct amdgpu_device
*adev, struct amdgpu_ctx *ctx)
 struct amdgpu_ring *ring = adev->rings[i];
 struct amd_sched_rq *rq;

-rq = &ring->sched.sched_rq[AMD_SCHED_PRIORITY_NORMAL];
+rq = &ring->sched.sched_rq[priority];
 r = amd_sched_entity_init(&ring->sched, &ctx->rings[i].entity,
   rq, amdgpu_sched_jobs);
 if (r)
@@ -90,6 +98,7 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)

 static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
 struct amdgpu_fpriv *fpriv,
+enum amd_sched_priority priority,
 uint32_t *id)
 {
 struct amdgpu_ctx_mgr *mgr = &fpriv->ctx_mgr;
@@ -107,8 +116,9 @@ static int amdgpu_ctx_alloc(struct amdgpu_device
*adev,
 kfree(ctx);
 return r;
 }
+
 *id = (uint32_t)r;
-r = amdgpu_ctx_init(adev, ctx);
+r = amdgpu_ctx_init(adev, priority, ctx);
 if (r) {
 idr_remove(&mgr->ctx_handles, *id);
 *id = 0;
@@ -182,11 +192,27 @@ static int amdgpu_ctx_query(struct amdgpu_device
*adev,
 return 0;
 }

+static enum amd_sched_priority amdgpu_to_sched_priority(int
amdgpu_priority)
+{
+switch (amdgpu_priority) {
+case AMDGPU_CTX_PRIORITY_HIGH:
+return AMD_SCHED_PRIORITY_HIGH;
+case AMDGPU_CTX_PRIORITY_NORMAL:
+return AMD_SCHED_PRIORITY_NORMAL;
+case AMDGPU_CTX_PRIORITY_LOW:
+return AMD_SCHED_PRIORITY_LOW;


This needs to be changed now to support the range.




I actually don't intend on the priority parameter to behave like a
range. libdrm is expected to pass in HIGH/NORMAL/LOW, and nothing in
between.


Okay, makes sense.



The current version of the hardware only supports a handful of discrete
priority configurations. So I would rather avoid creating the illusion
that a priority of -333 is any different than 0.

What I like about your suggestion of spreading out the values further
apart (-1023, 0, 1023 vs -1, 0, +1), is that it gives us the option to
add new priority values and keep everything ordered. Or, we could also
expand into ranges and still maintain backwards compatibility (if the HW
supports it).

The EGL extension and the vulkan draft extension for context priorities
also use discrete values. So I don't really see a case for pursuing
range based priorities when the APIs and the HW don't support it.



+default:
+WARN(1, "Invalid context priority %d\n", amdgpu_priority);
+return AMD_SCHED_PRIORITY_NORMAL;
+}
+}
+
 int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
  struct drm_file *filp)
 {
 int r;
 uint32_t

Re: [PATCH] drm/amdgpu: add parameter to allocate high priority contexts v9

2017-04-29 Thread Nicolai Hähnle

Thanks for the update!


On 26.04.2017 03:10, Andres Rodriguez wrote:

Add a new context creation parameter to express a global context priority.

The priority ranking in descending order is as follows:
 * AMDGPU_CTX_PRIORITY_HIGH
 * AMDGPU_CTX_PRIORITY_NORMAL
 * AMDGPU_CTX_PRIORITY_LOW

The driver will attempt to schedule work to the hardware according to
the priorities. No latency or throughput guarantees are provided by
this patch.

This interface intends to service the EGL_IMG_context_priority
extension, and vulkan equivalents.

v2: Instead of using flags, repurpose __pad
v3: Swap enum values of _NORMAL _HIGH for backwards compatibility
v4: Validate usermode priority and store it
v5: Move priority validation into amdgpu_ctx_ioctl(), headline reword
v6: add UAPI note regarding priorities requiring CAP_SYS_ADMIN
v7: remove ctx->priority
v8: added AMDGPU_CTX_PRIORITY_LOW, s/CAP_SYS_ADMIN/CAP_SYS_NICE
v9: change the priority parameter to __s32

Reviewed-by: Emil Velikov 
Reviewed-by: Christian König 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 38 ---
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.h |  4 ++-
 include/uapi/drm/amdgpu_drm.h |  8 +-
 3 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index b43..af75571 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -25,11 +25,19 @@
 #include 
 #include "amdgpu.h"

-static int amdgpu_ctx_init(struct amdgpu_device *adev, struct amdgpu_ctx *ctx)
+static int amdgpu_ctx_init(struct amdgpu_device *adev,
+  enum amd_sched_priority priority,
+  struct amdgpu_ctx *ctx)
 {
unsigned i, j;
int r;

+   if (priority < 0 || priority >= AMD_SCHED_PRIORITY_MAX)
+   return -EINVAL;
+
+   if (priority >= AMD_SCHED_PRIORITY_HIGH && !capable(CAP_SYS_NICE))
+   return -EACCES;
+
memset(ctx, 0, sizeof(*ctx));
ctx->adev = adev;
kref_init(&ctx->refcount);
@@ -51,7 +59,7 @@ static int amdgpu_ctx_init(struct amdgpu_device *adev, struct 
amdgpu_ctx *ctx)
struct amdgpu_ring *ring = adev->rings[i];
struct amd_sched_rq *rq;

-   rq = &ring->sched.sched_rq[AMD_SCHED_PRIORITY_NORMAL];
+   rq = &ring->sched.sched_rq[priority];
r = amd_sched_entity_init(&ring->sched, &ctx->rings[i].entity,
  rq, amdgpu_sched_jobs);
if (r)
@@ -90,6 +98,7 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)

 static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
struct amdgpu_fpriv *fpriv,
+   enum amd_sched_priority priority,
uint32_t *id)
 {
struct amdgpu_ctx_mgr *mgr = &fpriv->ctx_mgr;
@@ -107,8 +116,9 @@ static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
kfree(ctx);
return r;
}
+
*id = (uint32_t)r;
-   r = amdgpu_ctx_init(adev, ctx);
+   r = amdgpu_ctx_init(adev, priority, ctx);
if (r) {
idr_remove(&mgr->ctx_handles, *id);
*id = 0;
@@ -182,11 +192,27 @@ static int amdgpu_ctx_query(struct amdgpu_device *adev,
return 0;
 }

+static enum amd_sched_priority amdgpu_to_sched_priority(int amdgpu_priority)
+{
+   switch (amdgpu_priority) {
+   case AMDGPU_CTX_PRIORITY_HIGH:
+   return AMD_SCHED_PRIORITY_HIGH;
+   case AMDGPU_CTX_PRIORITY_NORMAL:
+   return AMD_SCHED_PRIORITY_NORMAL;
+   case AMDGPU_CTX_PRIORITY_LOW:
+   return AMD_SCHED_PRIORITY_LOW;


This needs to be changed now to support the range.



+   default:
+   WARN(1, "Invalid context priority %d\n", amdgpu_priority);
+   return AMD_SCHED_PRIORITY_NORMAL;
+   }
+}
+
 int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
 struct drm_file *filp)
 {
int r;
uint32_t id;
+   enum amd_sched_priority priority;

union drm_amdgpu_ctx *args = data;
struct amdgpu_device *adev = dev->dev_private;
@@ -194,10 +220,14 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,

r = 0;
id = args->in.ctx_id;
+   priority = amdgpu_to_sched_priority(args->in.priority);
+
+   if (priority >= AMD_SCHED_PRIORITY_MAX)
+   return -EINVAL;


We check ioctl parameters before using them. In this case the range 
check should happen before all this. Misbehaving user-space programs 
shouldn't be able to run into the WARN in amdgpu_to_sched_priority that 
easily, and most of all they shouldn't silently have their ioctls 
succeed. Otherwise, we limit our ability to evolve the interface.


Cheers,
Nicolai




switch (args->in.op

Re: [PATCH] drm: Harmonize CIK ASIC support in radeon and amdgpu (v2)

2017-04-25 Thread Nicolai Hähnle

On 25.04.2017 08:28, Michel Dänzer wrote:

On 22/04/17 02:05 AM, Felix Kuehling wrote:

__setup doesn't work in modules.


Right. We could build something like
drivers/video/fbdev/core/fb_cmdline.c:video_setup() into the kernel to
handle this, but it's a bit ugly, which is one reason why I was leaning
towards:



s8250_options is only compiled if the driver is not a module.


That doesn't prevent us from using __module_param_call directly, does it?

Although, that still doesn't work as I envision if only one driver's
option is set e.g. in /etc/modprobe.d/*.conf .


So, I'm starting to think we need a shared module for this, which
provides one or multiple module parameters to choose which driver to use
for CIK/SI[0], and provides the result to the amdgpu/radeon drivers.
That might make things easier for amdgpu-pro / other standalone amdgpu
versions in the long run as well, as they could add files to
/etc/modprobe.d/ choosing themselves by default, without having to
blacklist radeon.

What do you guys think?


I suspect that adding an entire module just to select between two other 
modules is the kind of thing that should be discussed in a wider 
audience first.


It is probably the cleanest solution that doesn't require teaching the 
general modprobe architecture about having multiple modules for the same 
PCI ID...


Cheers,
Nicolai





[0] or possibly even more fine-grained in the future




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: add parameter to allocate high priority contexts v8

2017-04-25 Thread Nicolai Hähnle

On 24.04.2017 18:20, Andres Rodriguez wrote:

Add a new context creation parameter to express a global context priority.

The priority ranking in descending order is as follows:
 * AMDGPU_CTX_PRIORITY_HIGH
 * AMDGPU_CTX_PRIORITY_NORMAL
 * AMDGPU_CTX_PRIORITY_LOW

The driver will attempt to schedule work to the hardware according to
the priorities. No latency or throughput guarantees are provided by
this patch.

This interface intends to service the EGL_IMG_context_priority
extension, and vulkan equivalents.

v2: Instead of using flags, repurpose __pad
v3: Swap enum values of _NORMAL _HIGH for backwards compatibility
v4: Validate usermode priority and store it
v5: Move priority validation into amdgpu_ctx_ioctl(), headline reword
v6: add UAPI note regarding priorities requiring CAP_SYS_ADMIN
v7: remove ctx->priority
v8: added AMDGPU_CTX_PRIORITY_LOW, s/CAP_SYS_ADMIN/CAP_SYS_NICE

>

Reviewed-by: Emil Velikov 
Reviewed-by: Christian König 
Signed-off-by: Andres Rodriguez 


I didn't follow all the discussion, so feel free to shut me up if this 
has already been discussed, but...



[snip]

+/* Context priority level */
+#define AMDGPU_CTX_PRIORITY_NORMAL 0
+#define AMDGPU_CTX_PRIORITY_LOW1
+/* Selecting a priority above NORMAL requires CAP_SYS_ADMIN */
+#define AMDGPU_CTX_PRIORITY_HIGH   2
+#define AMDGPU_CTX_PRIORITY_NUM3


I get that normal priority needs to be 0 for backwards compatibility, 
but having LOW between NORMAL and HIGH is still odd. Have you considered 
using signed integers as a way to fix that?


(AMDGPU_CTX_PRIORITY_NUM doesn't seem to be used anywhere...)

Cheers,
Nicolai



+
 struct drm_amdgpu_ctx_in {
/** AMDGPU_CTX_OP_* */
__u32   op;
/** For future use, no flags defined so far */
__u32   flags;
__u32   ctx_id;
-   __u32   _pad;
+   __u32   priority;
 };

 union drm_amdgpu_ctx_out {
struct {
__u32   ctx_id;
__u32   _pad;
} alloc;

struct {
/** For future use, no flags defined so far */
__u64   flags;
/** Number of resets caused by this context so far. */
__u32   hangs;
/** Reset status since the last call of the ioctl. */
__u32   reset_status;
} state;
 };

 union drm_amdgpu_ctx {
struct drm_amdgpu_ctx_in in;




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm] amdgpu: Use the canonical form in branch predicate

2017-04-24 Thread Nicolai Hähnle

On 22.04.2017 08:47, Edward O'Callaghan wrote:

Suggested-by: Emil Velikov 
Signed-off-by: Edward O'Callaghan 


Reviewed-by: Nicolai Hähnle 


---
 amdgpu/amdgpu_cs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 0993a6d..868eb7b 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -559,7 +559,7 @@ int amdgpu_cs_wait_semaphore(amdgpu_context_handle ctx,
if (ring >= AMDGPU_CS_MAX_RINGS)
return -EINVAL;
/* must signal first */
-   if (NULL == sem->signal_fence.context)
+   if (!sem->signal_fence.context)
return -EINVAL;

pthread_mutex_lock(&ctx->sequence_mutex);




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix VM clearing in amdgpu_gem_object_close

2017-04-21 Thread Nicolai Hähnle

On 21.04.2017 10:07, Christian König wrote:

From: Christian König 

We need to check if the VM is swapped out before trying to update it.

Signed-off-by: Christian König 


Oops. That makes a lot of sense...

Fixes: 23e0563e48f7 ("drm/amdgpu: clear freed mappings immediately when 
BO may be freed")

Reviewed-by: Nicolai Hähnle 



---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 68 ++---
 1 file changed, 37 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 0386015..67be795 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -139,6 +139,35 @@ int amdgpu_gem_object_open(struct drm_gem_object *obj,
return 0;
 }

+static int amdgpu_gem_vm_check(void *param, struct amdgpu_bo *bo)
+{
+   /* if anything is swapped out don't swap it in here,
+  just abort and wait for the next CS */
+   if (!amdgpu_bo_gpu_accessible(bo))
+   return -ERESTARTSYS;
+
+   if (bo->shadow && !amdgpu_bo_gpu_accessible(bo->shadow))
+   return -ERESTARTSYS;
+
+   return 0;
+}
+
+static bool amdgpu_gem_vm_ready(struct amdgpu_device *adev,
+   struct amdgpu_vm *vm,
+   struct list_head *list)
+{
+   struct ttm_validate_buffer *entry;
+
+   list_for_each_entry(entry, list, head) {
+   struct amdgpu_bo *bo =
+   container_of(entry->bo, struct amdgpu_bo, tbo);
+   if (amdgpu_gem_vm_check(NULL, bo))
+   return false;
+   }
+
+   return !amdgpu_vm_validate_pt_bos(adev, vm, amdgpu_gem_vm_check, NULL);
+}
+
 void amdgpu_gem_object_close(struct drm_gem_object *obj,
 struct drm_file *file_priv)
 {
@@ -148,15 +177,13 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj,
struct amdgpu_vm *vm = &fpriv->vm;

struct amdgpu_bo_list_entry vm_pd;
-   struct list_head list, duplicates;
+   struct list_head list;
struct ttm_validate_buffer tv;
struct ww_acquire_ctx ticket;
struct amdgpu_bo_va *bo_va;
-   struct fence *fence = NULL;
int r;

INIT_LIST_HEAD(&list);
-   INIT_LIST_HEAD(&duplicates);

tv.bo = &bo->tbo;
tv.shared = true;
@@ -164,16 +191,18 @@ void amdgpu_gem_object_close(struct drm_gem_object *obj,

amdgpu_vm_get_pd_bo(vm, &list, &vm_pd);

-   r = ttm_eu_reserve_buffers(&ticket, &list, false, &duplicates);
+   r = ttm_eu_reserve_buffers(&ticket, &list, false, NULL);
if (r) {
dev_err(adev->dev, "leaking bo va because "
"we fail to reserve bo (%d)\n", r);
return;
}
bo_va = amdgpu_vm_bo_find(vm, bo);
-   if (bo_va) {
-   if (--bo_va->ref_count == 0) {
-   amdgpu_vm_bo_rmv(adev, bo_va);
+   if (bo_va && --bo_va->ref_count == 0) {
+   amdgpu_vm_bo_rmv(adev, bo_va);
+
+   if (amdgpu_gem_vm_ready(adev, vm, &list)) {
+   struct fence *fence = NULL;

r = amdgpu_vm_clear_freed(adev, vm, &fence);
if (unlikely(r)) {
@@ -504,19 +533,6 @@ int amdgpu_gem_metadata_ioctl(struct drm_device *dev, void 
*data,
return r;
 }

-static int amdgpu_gem_va_check(void *param, struct amdgpu_bo *bo)
-{
-   /* if anything is swapped out don't swap it in here,
-  just abort and wait for the next CS */
-   if (!amdgpu_bo_gpu_accessible(bo))
-   return -ERESTARTSYS;
-
-   if (bo->shadow && !amdgpu_bo_gpu_accessible(bo->shadow))
-   return -ERESTARTSYS;
-
-   return 0;
-}
-
 /**
  * amdgpu_gem_va_update_vm -update the bo_va in its VM
  *
@@ -535,19 +551,9 @@ static void amdgpu_gem_va_update_vm(struct amdgpu_device 
*adev,
struct list_head *list,
uint32_t operation)
 {
-   struct ttm_validate_buffer *entry;
int r = -ERESTARTSYS;

-   list_for_each_entry(entry, list, head) {
-   struct amdgpu_bo *bo =
-   container_of(entry->bo, struct amdgpu_bo, tbo);
-   if (amdgpu_gem_va_check(NULL, bo))
-   goto error;
-   }
-
-   r = amdgpu_vm_validate_pt_bos(adev, vm, amdgpu_gem_va_check,
- NULL);
-   if (r)
+   if (!amdgpu_gem_vm_ready(adev, vm, list))
goto error;

r = amdgpu_vm_update_directories(adev, vm);




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [libdrm] amdgpu/: concisely && consistently check null ptrs in canonical form

2017-04-18 Thread Nicolai Hähnle

On 18.04.2017 18:20, Edward O'Callaghan wrote:

Be consistent and use the canonical form while sanity checking
null pointers, also combine a few branches for brevity.

Signed-off-by: Edward O'Callaghan 


Sure, it's a good cleanup. Feel free to add the corresponding change 
also to amdgpu_cs_wait_fences which I've already pushed.


Reviewed-by: Nicolai Hähnle 



---
 amdgpu/amdgpu_bo.c   |  2 +-
 amdgpu/amdgpu_cs.c   | 36 +++-
 amdgpu/amdgpu_gpu_info.c |  5 +++--
 3 files changed, 15 insertions(+), 28 deletions(-)

diff --git a/amdgpu/amdgpu_bo.c b/amdgpu/amdgpu_bo.c
index 9adfffa..5ac456b 100644
--- a/amdgpu/amdgpu_bo.c
+++ b/amdgpu/amdgpu_bo.c
@@ -652,7 +652,7 @@ int amdgpu_bo_list_update(amdgpu_bo_list_handle handle,
return -EINVAL;

list = malloc(number_of_resources * sizeof(struct 
drm_amdgpu_bo_list_entry));
-   if (list == NULL)
+   if (!list)
return -ENOMEM;

args.in.operation = AMDGPU_BO_LIST_OP_UPDATE;
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index fb5b3a8..7fbba96 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -59,13 +59,11 @@ int amdgpu_cs_ctx_create(amdgpu_device_handle dev,
int i, j, k;
int r;

-   if (NULL == dev)
-   return -EINVAL;
-   if (NULL == context)
+   if (!dev || !context)
return -EINVAL;

gpu_context = calloc(1, sizeof(struct amdgpu_context));
-   if (NULL == gpu_context)
+   if (!gpu_context)
return -ENOMEM;

gpu_context->dev = dev;
@@ -110,7 +108,7 @@ int amdgpu_cs_ctx_free(amdgpu_context_handle context)
int i, j, k;
int r;

-   if (NULL == context)
+   if (!context)
return -EINVAL;

pthread_mutex_destroy(&context->sequence_mutex);
@@ -330,9 +328,7 @@ int amdgpu_cs_submit(amdgpu_context_handle context,
uint32_t i;
int r;

-   if (NULL == context)
-   return -EINVAL;
-   if (NULL == ibs_request)
+   if (!context || !ibs_request)
return -EINVAL;

r = 0;
@@ -416,11 +412,7 @@ int amdgpu_cs_query_fence_status(struct amdgpu_cs_fence 
*fence,
bool busy = true;
int r;

-   if (NULL == fence)
-   return -EINVAL;
-   if (NULL == expired)
-   return -EINVAL;
-   if (NULL == fence->context)
+   if (!fence || !expired || !fence->context)
return -EINVAL;
if (fence->ip_type >= AMDGPU_HW_IP_NUM)
return -EINVAL;
@@ -447,11 +439,11 @@ int amdgpu_cs_create_semaphore(amdgpu_semaphore_handle 
*sem)
 {
struct amdgpu_semaphore *gpu_semaphore;

-   if (NULL == sem)
+   if (!sem)
return -EINVAL;

gpu_semaphore = calloc(1, sizeof(struct amdgpu_semaphore));
-   if (NULL == gpu_semaphore)
+   if (!gpu_semaphore)
return -ENOMEM;

atomic_set(&gpu_semaphore->refcount, 1);
@@ -466,14 +458,12 @@ int amdgpu_cs_signal_semaphore(amdgpu_context_handle ctx,
   uint32_t ring,
   amdgpu_semaphore_handle sem)
 {
-   if (NULL == ctx)
+   if (!ctx || !sem)
return -EINVAL;
if (ip_type >= AMDGPU_HW_IP_NUM)
return -EINVAL;
if (ring >= AMDGPU_CS_MAX_RINGS)
return -EINVAL;
-   if (NULL == sem)
-   return -EINVAL;
/* sem has been signaled */
if (sem->signal_fence.context)
return -EINVAL;
@@ -494,14 +484,12 @@ int amdgpu_cs_wait_semaphore(amdgpu_context_handle ctx,
 uint32_t ring,
 amdgpu_semaphore_handle sem)
 {
-   if (NULL == ctx)
+   if (!ctx || !sem)
return -EINVAL;
if (ip_type >= AMDGPU_HW_IP_NUM)
return -EINVAL;
if (ring >= AMDGPU_CS_MAX_RINGS)
return -EINVAL;
-   if (NULL == sem)
-   return -EINVAL;
/* must signal first */
if (NULL == sem->signal_fence.context)
return -EINVAL;
@@ -514,9 +502,7 @@ int amdgpu_cs_wait_semaphore(amdgpu_context_handle ctx,

 static int amdgpu_cs_reset_sem(amdgpu_semaphore_handle sem)
 {
-   if (NULL == sem)
-   return -EINVAL;
-   if (NULL == sem->signal_fence.context)
+   if (!sem || !sem->signal_fence.context)
return -EINVAL;

sem->signal_fence.context = NULL;;
@@ -530,7 +516,7 @@ static int amdgpu_cs_reset_sem(amdgpu_semaphore_handle sem)

 static int amdgpu_cs_unreference_sem(amdgpu_semaphore_handle sem)
 {
-   if (NULL == sem)
+   if (!sem)
return -EINVAL;

if (update_references(&sem->refcount, NULL))
diff --git a/amdgpu/amdgpu_gpu_info.c b/amdgpu/amdgpu_gpu_info.c
index f4b94c9..1efffc6 10

Re: [PATCH libdrm 1/2] amdgpu: add the interface of waiting multiple fences

2017-04-18 Thread Nicolai Hähnle

On 18.04.2017 17:47, Edward O'Callaghan wrote:

On 04/14/2017 12:47 AM, Nicolai Hähnle wrote:

From: Nicolai Hähnle 

Signed-off-by: Junwei Zhang 
[v2: allow returning the first signaled fence index]
Signed-off-by: monk.liu 
[v3:
 - cleanup *status setting
 - fix amdgpu symbols check]
Signed-off-by: Nicolai Hähnle 
Reviewed-by: Christian König  (v1)
Reviewed-by: Jammy Zhou  (v1)
---
 amdgpu/amdgpu-symbol-check |  1 +
 amdgpu/amdgpu.h| 23 ++
 amdgpu/amdgpu_cs.c | 74 ++
 3 files changed, 98 insertions(+)


[snip]

+static int amdgpu_ioctl_wait_fences(struct amdgpu_cs_fence *fences,
+   uint32_t fence_count,
+   bool wait_all,
+   uint64_t timeout_ns,
+   uint32_t *status,
+   uint32_t *first)
+{
+   struct drm_amdgpu_fence *drm_fences;
+   amdgpu_device_handle dev = fences[0].context->dev;
+   union drm_amdgpu_wait_fences args;
+   int r;
+   uint32_t i;
+
+   drm_fences = alloca(sizeof(struct drm_amdgpu_fence) * fence_count);
+   for (i = 0; i < fence_count; i++) {
+   drm_fences[i].ctx_id = fences[i].context->id;
+   drm_fences[i].ip_type = fences[i].ip_type;
+   drm_fences[i].ip_instance = fences[i].ip_instance;
+   drm_fences[i].ring = fences[i].ring;
+   drm_fences[i].seq_no = fences[i].fence;
+   }
+
+   memset(&args, 0, sizeof(args));
+   args.in.fences = (uint64_t)(uintptr_t)drm_fences;
+   args.in.fence_count = fence_count;
+   args.in.wait_all = wait_all;
+   args.in.timeout_ns = amdgpu_cs_calculate_timeout(timeout_ns);
+
+   r = drmIoctl(dev->fd, DRM_IOCTL_AMDGPU_WAIT_FENCES, &args);
+   if (r)
+   return -errno;


Hi Nicolai,

you will leak drm_fences here on the error branch.


It's an alloca, so it's automatically freed when the function returns.



+
+   *status = args.out.status;
+
+   if (first)
+   *first = args.out.first_signaled;
+
+   return 0;
+}
+
+int amdgpu_cs_wait_fences(struct amdgpu_cs_fence *fences,
+ uint32_t fence_count,
+ bool wait_all,
+ uint64_t timeout_ns,
+ uint32_t *status,
+ uint32_t *first)
+{
+   uint32_t i;
+   int r;


no need for a intermediate ret, just return amdgpu_ioctl_wait_fences()
directly?


Good point, I'll change that before I push.



+
+   /* Sanity check */
+   if (NULL == fences)
+   return -EINVAL;
+   if (NULL == status)
+   return -EINVAL;
+   if (fence_count <= 0)
+   return -EINVAL;


may as well combine these branches?

if (!fences || !status || !fence_count)
return -EINVAL;

as fence_count is unsigned.


Yeah, that makes some sense, but I decided to keep the separate 
if-statements because other functions are written like this as well.


Thanks,
Nicolai





Kind Regards,
Edward.


+   for (i = 0; i < fence_count; i++) {
+   if (NULL == fences[i].context)
+   return -EINVAL;
+   if (fences[i].ip_type >= AMDGPU_HW_IP_NUM)
+   return -EINVAL;
+   if (fences[i].ring >= AMDGPU_CS_MAX_RINGS)
+   return -EINVAL;
+   }
+
+   *status = 0;
+
+   r = amdgpu_ioctl_wait_fences(fences, fence_count, wait_all, timeout_ns,
+status, first);
+
+   return r;
+}
+
 int amdgpu_cs_create_semaphore(amdgpu_semaphore_handle *sem)
 {
struct amdgpu_semaphore *gpu_semaphore;

if (NULL == sem)
return -EINVAL;

gpu_semaphore = calloc(1, sizeof(struct amdgpu_semaphore));
if (NULL == gpu_semaphore)
return -ENOMEM;






--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: PRT support for gfx9

2017-04-18 Thread Nicolai Hähnle

On 18.04.2017 05:13, Zhang, Jerry (Junwei) wrote:

On 04/18/2017 10:47 AM, zhoucm1 wrote:



On 2017年04月18日 09:51, Zhang, Jerry (Junwei) wrote:


Anyone could help to review it?

On 04/17/2017 05:04 PM, Junwei Zhang wrote:

Signed-off-by: Junwei Zhang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 5 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 1 +
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 2 +-
  3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 9ff445c..51aedf9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1269,6 +1269,11 @@ int amdgpu_vm_bo_update(struct amdgpu_device
*adev,
  spin_unlock(&vm->status_lock);

  list_for_each_entry(mapping, &bo_va->invalids, list) {
+if (mapping->flags & AMDGPU_PTE_TILED) {
+flags |= AMDGPU_PTE_TILED;
+flags &= ~AMDGPU_PTE_VALID;


Why do you need to explicitly clear the VALID bit here? I'd expect 
whoever creates the mapping to already ensure that the VALID bit is cleared.




+}
+

How about clear operation?


CLEAR op will clear all mapping with flag=0, put into free list, and
then clear them by amdgpu_vm_clear_freed().

When amdgpu_vm_bo_update() is performed, the mapping's flag is 0 now.




  r = amdgpu_vm_bo_split_mapping(adev, exclusive,
 gtt_flags, pages_addr, vm,
 mapping, flags, mem,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 4904740..8d25914 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -70,6 +70,7 @@
  /* VEGA10 only */
  #define AMDGPU_PTE_MTYPE(a)((uint64_t)a << 57)
  #define AMDGPU_PTE_MTYPE_MASKAMDGPU_PTE_MTYPE(3ULL)
+#define AMDGPU_PTE_TILED(1ULL << 51)

  /* How to programm VM fault handling */
  #define AMDGPU_VM_FAULT_STOP_NEVER0
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 51a1919..6d033ae 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -324,7 +324,7 @@ static uint64_t gmc_v9_0_get_vm_pte_flags(struct
amdgpu_device *adev,
  }

  if (flags & AMDGPU_VM_PAGE_PRT)
-pte_flag |= AMDGPU_PTE_PRT;
+pte_flag |= AMDGPU_PTE_TILED;

PTE_PRT name doesn't make sense?


This naming is tricky for PRT feature, as there is not PRT bit on PTE
for pre-gfx9.
So PTE_PRT is used in reserve bit of PTE.
But for gfx9, there is a actual bit for PTE, it's better to use the real
one.


Wouldn't it be better to keep the AMDGPU_PTE_PRT name, and just make 
sure that the bit isn't set on older ASICs (unless the bit is simply 
ignored on older ASICs, in which case we don't have to bother either way)?


Cheers,
Nicolai




BTW, all ASICs will use the general flag in UMD now.
But different handling inside PTE.

Jerry



Regards,
David Zhou


  return pte_flag;
  }


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH libdrm 0/2] amdgpu: add amdgpu_cs_wait_fences

2017-04-18 Thread Nicolai Hähnle

Post-Easter ping - anybody want to give their R-b?

Thanks,
Nicolai

On 13.04.2017 16:47, Nicolai Hähnle wrote:

Hi all,

These changes expose a function to call the WAIT_FENCES ioctl for
waiting on multiple fences at the same time. This is useful for
Vulkan.

They are mostly changes that have been in the amdgpu-pro libdrm
for a long time. I've taken the liberty to clean them up a bit
and add some missing bits.

Please review!
Thanks,
Nicolai




--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


  1   2   3   >