Re: [Intel-gfx] [PATCH i-g-t v4] benchmarks/gem_wsim: Command submission workload simulator
On Thu, Apr 20, 2017 at 01:29:11PM +0100, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin> +static void > +run_workload(unsigned int id, struct workload *wrk, unsigned int repeat, > + enum intel_engine_id (*balance)(struct workload *wrk, > + struct w_step *w), > + unsigned int flags) > +{ > + struct timespec t_start, t_end; > + struct w_step *w; > + double t; > + int i, j; > + > + clock_gettime(CLOCK_MONOTONIC, _start); > + > + srand(t_start.tv_nsec); Let's supply the seed with the workload specification. And use a portable prng so we can be sure we can reproduce results from one system to the next. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t v4] benchmarks/gem_wsim: Command submission workload simulator
On Thu, Apr 20, 2017 at 03:34:56PM +0100, Tvrtko Ursulin wrote: > > On 20/04/2017 15:23, Chris Wilson wrote: > >On Thu, Apr 20, 2017 at 01:29:11PM +0100, Tvrtko Ursulin wrote: > >>+static void > >>+alloc_step_batch(struct workload *wrk, struct w_step *w, struct w_step_eb > >>*b, > >>+enum intel_engine_id engine, unsigned int flags) > >>+{ > >>+ unsigned int bb_i, j = 0; > >>+ > >>+ b->obj[j].handle = gem_create(fd, 4096); > >>+ b->obj[j].flags = EXEC_OBJECT_WRITE; > >>+ j++; > >>+ > >>+ if (flags & SEQNO) { > >>+ b->obj[j].handle = wrk->status_page_handle; > >>+ j++; > >>+ } > >>+ > >>+ bb_i = j++; > >>+ b->bb_sz = get_bb_sz(w->duration.max); > >>+ b->bb_handle = b->obj[bb_i].handle = gem_create(fd, b->bb_sz); > >>+ terminate_bb(w, b, engine, flags); > >>+ > >>+ igt_assert(w->dependency <= 0); > >>+ if (w->dependency) { > >>+ int dep_idx = w->idx + w->dependency; > >>+ > >>+ igt_assert(dep_idx >= 0 && dep_idx < wrk->nr_steps); > >>+ igt_assert(wrk->steps[dep_idx].type == BATCH); > >>+ > >>+ b->obj[j].handle = b->obj[bb_i].handle; > >>+ bb_i = j; > >>+ b->obj[j - 1].handle = wrk->steps[dep_idx].b[0].obj[0].handle; > >>+ j++; > >>+ > >>+ if (wrk->steps[dep_idx].b[1].obj[0].handle) { > >>+ b->obj[j].handle = b->obj[bb_i].handle; > >>+ bb_i = j; > >>+ b->obj[j - 1].handle = > >>+ wrk->steps[dep_idx].b[1].obj[0].handle; > >>+ j++; > >>+ } > >>+ } > >>+ > >>+ if (flags & SEQNO) { > >>+ b->reloc.presumed_offset = -1; > > > >So as I understand it, you are caching the execbuf/obj/reloc for the > >workload and then may reissue later with different seqno on different > >rings? In which case we have a problem as the kernel will write back the > > > > >updated offsets to b->reloc.presumed_offset and b->obj[].offset and in > >future passes they will match and the seqno write will go into the wrong > >slot (if it swaps rings). > > > >You either want to reset presumed_offset=-1 each time, or better for all > >concerned write the correct address alongside the seqno (which also > >enables NORELOC). > > > >Delta incoming. > > Only the seqno changes, but each engine has its own eb/obj/reloc. So > I think there is no problem. Or is there still? bb_handle is per engine as well. Ugh. No, that seems like you are self-consistent, you just need to remove the -1 and your code is NORELOC correct. I wouldn't go so far as having entirely separate batches for each engine though :) -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t v4] benchmarks/gem_wsim: Command submission workload simulator
On 20/04/2017 15:52, Chris Wilson wrote: On Thu, Apr 20, 2017 at 01:29:11PM +0100, Tvrtko Ursulin wrote: + wrk->nr_bb[engine]++; + + if (engine == VCS && balance) { + engine = balance(wrk, w); + wrk->nr_bb[engine]++; + b = >b[engine - VCS1]; + + if (flags & SEQNO) + update_bb_seqno(b, engine, + ++wrk->seqno[engine]); + } + + if (w->duration.min != w->duration.max) { + unsigned int d = get_duration(>duration); + unsigned long offset; + + offset = ALIGN(b->bb_sz - get_bb_sz(d), + 2 * sizeof(uint32_t)); + b->eb.batch_start_offset = offset; + } + + gem_execbuf(fd, >eb); Likely double counting wrk->nr_bb. I suggest placing it next to the gem_execbuf(). Just convenience in balancing mode so that nr(VCS) = nr(VCS1) + nr(VCS2). Also from a different angle, if the sum does not hold, that means workload had auto-balancing and explicit VCS1/2 batches. It's only used to print out the stats at the end of the run. Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t v4] benchmarks/gem_wsim: Command submission workload simulator
On Thu, Apr 20, 2017 at 01:29:11PM +0100, Tvrtko Ursulin wrote: > + wrk->nr_bb[engine]++; > + > + if (engine == VCS && balance) { > + engine = balance(wrk, w); > + wrk->nr_bb[engine]++; > + b = >b[engine - VCS1]; > + > + if (flags & SEQNO) > + update_bb_seqno(b, engine, > + ++wrk->seqno[engine]); > + } > + > + if (w->duration.min != w->duration.max) { > + unsigned int d = get_duration(>duration); > + unsigned long offset; > + > + offset = ALIGN(b->bb_sz - get_bb_sz(d), > +2 * sizeof(uint32_t)); > + b->eb.batch_start_offset = offset; > + } > + > + gem_execbuf(fd, >eb); Likely double counting wrk->nr_bb. I suggest placing it next to the gem_execbuf(). -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t v4] benchmarks/gem_wsim: Command submission workload simulator
On 20/04/2017 15:33, Chris Wilson wrote: On Thu, Apr 20, 2017 at 03:23:27PM +0100, Chris Wilson wrote: You either want to reset presumed_offset=-1 each time, or better for all concerned write the correct address alongside the seqno (which also enables NORELOC). Delta incoming. See attached. Next concern is that I have full rings which implies that we are not waiting on each batch before resubmitting with a new seqno? If I throw a assert(!busy(batch_bo)) before the *b->mapped_seqno am I going to be upset? Yes you would. :) I had a sync (as a move to cpu domain) before seqno update in the last version but it disappeared as I was fixing the whole area of seqno tracking. So the balancing results in the patch are bogus since the seqno can jump to latest ahead of the time... Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t v4] benchmarks/gem_wsim: Command submission workload simulator
On 20/04/2017 15:23, Chris Wilson wrote: On Thu, Apr 20, 2017 at 01:29:11PM +0100, Tvrtko Ursulin wrote: +static void +alloc_step_batch(struct workload *wrk, struct w_step *w, struct w_step_eb *b, +enum intel_engine_id engine, unsigned int flags) +{ + unsigned int bb_i, j = 0; + + b->obj[j].handle = gem_create(fd, 4096); + b->obj[j].flags = EXEC_OBJECT_WRITE; + j++; + + if (flags & SEQNO) { + b->obj[j].handle = wrk->status_page_handle; + j++; + } + + bb_i = j++; + b->bb_sz = get_bb_sz(w->duration.max); + b->bb_handle = b->obj[bb_i].handle = gem_create(fd, b->bb_sz); + terminate_bb(w, b, engine, flags); + + igt_assert(w->dependency <= 0); + if (w->dependency) { + int dep_idx = w->idx + w->dependency; + + igt_assert(dep_idx >= 0 && dep_idx < wrk->nr_steps); + igt_assert(wrk->steps[dep_idx].type == BATCH); + + b->obj[j].handle = b->obj[bb_i].handle; + bb_i = j; + b->obj[j - 1].handle = wrk->steps[dep_idx].b[0].obj[0].handle; + j++; + + if (wrk->steps[dep_idx].b[1].obj[0].handle) { + b->obj[j].handle = b->obj[bb_i].handle; + bb_i = j; + b->obj[j - 1].handle = + wrk->steps[dep_idx].b[1].obj[0].handle; + j++; + } + } + + if (flags & SEQNO) { + b->reloc.presumed_offset = -1; So as I understand it, you are caching the execbuf/obj/reloc for the workload and then may reissue later with different seqno on different rings? In which case we have a problem as the kernel will write back the > > updated offsets to b->reloc.presumed_offset and b->obj[].offset and in future passes they will match and the seqno write will go into the wrong slot (if it swaps rings). You either want to reset presumed_offset=-1 each time, or better for all concerned write the correct address alongside the seqno (which also enables NORELOC). Delta incoming. Only the seqno changes, but each engine has its own eb/obj/reloc. So I think there is no problem. Or is there still? Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t v4] benchmarks/gem_wsim: Command submission workload simulator
On Thu, Apr 20, 2017 at 03:23:27PM +0100, Chris Wilson wrote: > You either want to reset presumed_offset=-1 each time, or better for all > concerned write the correct address alongside the seqno (which also > enables NORELOC). > > Delta incoming. See attached. Next concern is that I have full rings which implies that we are not waiting on each batch before resubmitting with a new seqno? If I throw a assert(!busy(batch_bo)) before the *b->mapped_seqno am I going to be upset? -Chris -- Chris Wilson, Intel Open Source Technology Centre >From 5bf424c2719e81f926b74f4136610cbdfd26a4d8 Mon Sep 17 00:00:00 2001 From: Chris WilsonDate: Thu, 20 Apr 2017 15:30:07 +0100 Subject: [PATCH] seqno-reloc --- benchmarks/gem_wsim.c | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/benchmarks/gem_wsim.c b/benchmarks/gem_wsim.c index adf2d6de..e616335b 100644 --- a/benchmarks/gem_wsim.c +++ b/benchmarks/gem_wsim.c @@ -45,6 +45,8 @@ #include "drmtest.h" #include "intel_io.h" +#define LOCAL_I915_GEM_DOMAIN_WC 0x80 + enum intel_engine_id { RCS, BCS, @@ -86,7 +88,9 @@ struct w_step struct drm_i915_gem_relocation_entry reloc; unsigned long bb_sz; uint32_t bb_handle; - uint32_t *mapped_batch, *mapped_seqno; + uint32_t *mapped_batch; + uint64_t *mapped_address; + uint32_t *mapped_seqno; unsigned int mapped_len; } b[2]; /* One for each VCS when load balancing */ }; @@ -405,7 +409,8 @@ terminate_bb(struct w_step *w, struct w_step_eb *b, enum intel_engine_id engine, mmap_len += cmd_offset - mmap_start; gem_set_domain(fd, b->bb_handle, - I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU); + LOCAL_I915_GEM_DOMAIN_WC, + LOCAL_I915_GEM_DOMAIN_WC); ptr = gem_mmap__wc(fd, b->bb_handle, mmap_start, mmap_len, PROT_WRITE); cs = (uint32_t *)((char *)ptr + cmd_offset - mmap_start); @@ -415,6 +420,7 @@ terminate_bb(struct w_step *w, struct w_step_eb *b, enum intel_engine_id engine, b->reloc.delta = (engine - VCS1) * sizeof(uint32_t); *cs++ = MI_STORE_DWORD_IMM; + b->mapped_address = (uint64_t *)cs; *cs++ = 0; *cs++ = 0; b->mapped_seqno = cs; @@ -469,7 +475,6 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, struct w_step_eb *b, } if (flags & SEQNO) { - b->reloc.presumed_offset = -1; b->reloc.target_handle = 1; b->obj[bb_i].relocs_ptr = to_user_pointer(>reloc); b->obj[bb_i].relocation_count = 1; @@ -485,8 +490,7 @@ alloc_step_batch(struct workload *wrk, struct w_step *w, struct w_step_eb *b, engine = VCS1; b->eb.flags = eb_engine_map[engine]; b->eb.flags |= I915_EXEC_HANDLE_LUT; - if (!(flags & SEQNO)) - b->eb.flags |= I915_EXEC_NO_RELOC; + b->eb.flags |= I915_EXEC_NO_RELOC; #ifdef DEBUG printf("%u: %u:%x|%x|%x|%x %10lu flags=%llx bb=%x[%u] ctx[%u]=%u\n", w->idx, b->eb.buffer_count, b->obj[0].handle, @@ -628,8 +632,9 @@ static void update_bb_seqno(struct w_step_eb *b, enum intel_engine_id engine, uint32_t seqno) { - *b->mapped_seqno = seqno; b->reloc.delta = (engine - VCS1) * sizeof(uint32_t); + *b->mapped_address = b->reloc.presumed_offset + b->reloc.delta; + *b->mapped_seqno = seqno; } static void -- 2.11.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t v4] benchmarks/gem_wsim: Command submission workload simulator
On Thu, Apr 20, 2017 at 01:29:11PM +0100, Tvrtko Ursulin wrote: > +static void > +alloc_step_batch(struct workload *wrk, struct w_step *w, struct w_step_eb *b, > + enum intel_engine_id engine, unsigned int flags) > +{ > + unsigned int bb_i, j = 0; > + > + b->obj[j].handle = gem_create(fd, 4096); > + b->obj[j].flags = EXEC_OBJECT_WRITE; > + j++; > + > + if (flags & SEQNO) { > + b->obj[j].handle = wrk->status_page_handle; > + j++; > + } > + > + bb_i = j++; > + b->bb_sz = get_bb_sz(w->duration.max); > + b->bb_handle = b->obj[bb_i].handle = gem_create(fd, b->bb_sz); > + terminate_bb(w, b, engine, flags); > + > + igt_assert(w->dependency <= 0); > + if (w->dependency) { > + int dep_idx = w->idx + w->dependency; > + > + igt_assert(dep_idx >= 0 && dep_idx < wrk->nr_steps); > + igt_assert(wrk->steps[dep_idx].type == BATCH); > + > + b->obj[j].handle = b->obj[bb_i].handle; > + bb_i = j; > + b->obj[j - 1].handle = wrk->steps[dep_idx].b[0].obj[0].handle; > + j++; > + > + if (wrk->steps[dep_idx].b[1].obj[0].handle) { > + b->obj[j].handle = b->obj[bb_i].handle; > + bb_i = j; > + b->obj[j - 1].handle = > + wrk->steps[dep_idx].b[1].obj[0].handle; > + j++; > + } > + } > + > + if (flags & SEQNO) { > + b->reloc.presumed_offset = -1; So as I understand it, you are caching the execbuf/obj/reloc for the workload and then may reissue later with different seqno on different rings? In which case we have a problem as the kernel will write back the updated offsets to b->reloc.presumed_offset and b->obj[].offset and in future passes they will match and the seqno write will go into the wrong slot (if it swaps rings). You either want to reset presumed_offset=-1 each time, or better for all concerned write the correct address alongside the seqno (which also enables NORELOC). Delta incoming. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx