On Mon, Jul 25, 2016 at 1:19 PM, Marek Olšák <mar...@gmail.com> wrote: > On Mon, Jul 25, 2016 at 5:42 PM, Rob Clark <robdcl...@gmail.com> wrote: >> On Mon, Jul 25, 2016 at 11:16 AM, Brian Paul <bri...@vmware.com> wrote: >>> On 07/18/2016 07:11 AM, Marek Olšák wrote: >>>> >>>> @@ -183,49 +107,42 @@ static void check_attrib_edgeflag(struct st_context >>>> *st) >>>> >>>> void st_validate_state( struct st_context *st, enum st_pipeline pipeline >>>> ) >>>> { >>>> - const struct st_tracked_state **atoms; >>>> - struct st_state_flags *state; >>>> - GLuint num_atoms; >>>> - GLuint i; >>>> + uint64_t dirty, pipeline_mask; >>>> + uint32_t dirty_lo, dirty_hi; >>>> + >>>> + /* Get Mesa driver state. */ >>>> + st->dirty |= st->ctx->NewDriverState & ST_ALL_STATES_MASK; >>>> + st->ctx->NewDriverState = 0; >>>> >>>> /* Get pipeline state. */ >>>> switch (pipeline) { >>>> - case ST_PIPELINE_RENDER: >>>> - atoms = render_atoms; >>>> - num_atoms = ARRAY_SIZE(render_atoms); >>>> - state = &st->dirty; >>>> + case ST_PIPELINE_RENDER: >>>> + check_attrib_edgeflag(st); >>>> + check_program_state(st); >>>> + st_manager_validate_framebuffers(st); >>>> + >>>> + pipeline_mask = ST_PIPELINE_RENDER_STATE_MASK; >>>> break; >>>> case ST_PIPELINE_COMPUTE: >>>> - atoms = compute_atoms; >>>> - num_atoms = ARRAY_SIZE(compute_atoms); >>>> - state = &st->dirty_cp; >>>> + pipeline_mask = ST_PIPELINE_COMPUTE_STATE_MASK; >>>> break; >>>> default: >>>> unreachable("Invalid pipeline specified"); >>>> } >>>> >>>> - /* Get Mesa driver state. */ >>>> - st->dirty.st |= st->ctx->NewDriverState; >>>> - st->dirty_cp.st |= st->ctx->NewDriverState; >>>> - st->ctx->NewDriverState = 0; >>>> - >>>> - if (pipeline == ST_PIPELINE_RENDER) { >>>> - check_attrib_edgeflag(st); >>>> - >>>> - check_program_state(st); >>>> - >>>> - st_manager_validate_framebuffers(st); >>>> - } >>>> - >>>> - if (state->st == 0 && state->mesa == 0) >>>> + dirty = st->dirty & pipeline_mask; >>>> + if (!dirty) >>>> return; >>>> >>>> - /*printf("%s %x/%x\n", __func__, state->mesa, state->st);*/ >>>> + dirty_lo = dirty; >>>> + dirty_hi = dirty >> 32; >>>> >>>> - for (i = 0; i < num_atoms; i++) { >>>> - if (check_state(state, &atoms[i]->dirty)) >>>> - atoms[i]->update( st ); >>>> - } >>>> + /* Update states. */ >>>> + while (dirty_lo) >>>> + atoms[u_bit_scan(&dirty_lo)]->update(st); >>>> + while (dirty_hi) >>>> + atoms[32 + u_bit_scan(&dirty_hi)]->update(st); >>>> >>> >>> Could we just use the u_bit_scan64() function and void the hi/lo split? >> >> fwiw, we actually did discuss that on irc, but I guess no one >> summarized on email thread.. >> >> Marek's concern was that would generate worse code on 32b since it >> would pull the right-shift into the loop. >> >> I'm not entirely sure if gcc would be clever enough in this case or >> not. I guess someone needs to compare generated asm in both cases. >> And either use u_bit_scan64() if the compiler is clever enough, or add >> a comment explaining the reason. > > Yeah, I added this comment before the loops: > "Don't use u_bit_scan64, it may be slower on 32-bit." > > On 32-bit, ffsll is an if-then-else expression with some arithmetic > and shifting one bit to the left is another if-then-else expression.
fwiw, I did spend a bit of time this evening playing around with this, and the dirty_hi/dirty_lo approach w/ 32b/i686 build works out to be something like 12 instructions shorter for the loop body, ie. gcc isn't clever enough (total instruction count increases by doubling the loops but I think that doesn't matter).. given that this is a hot spot in profiles that I've looked at, it might even be worth having some #ifdef 64b / #else.. but ofc that could be left as a future exercise if someone cares.. either way, r-b BR, -R _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev