On Fri, 28 Aug 2015 19:00:51 +0100 "Ben Avison" <bavi...@riscosopen.org> wrote:
> On Fri, 28 Aug 2015 13:43:06 +0100, Pekka Paalanen <ppaala...@gmail.com> > wrote: > > > On Thu, 27 Aug 2015 17:20:26 +0100 > > "Ben Avison" <bavi...@riscosopen.org> wrote: > >> One thing it wouldn't be able to detect, though, would be where the fetch/ > >> combine/writeback iterators are faster than fast paths for the *same* > >> implementation level - such as with the ARMv6 nearest-scaled patches I > >> was revisiting recently. In that specific case, it turned out that my > >> original solution of bespoke C wrappers for the fetchers turned out to be > >> even faster - but we don't have any way at present of detecting if there > >> are other cases where we would be better off deleting the fast paths and > >> letting the iterators do the work instead. > > > > Sorry, but I'm a bit hazy on the details here. Based on the > > discussions, I have developed the following mental model: > > > > 1. asm fast paths (whole operation) > > 2. C fast paths (whole operation) > > 3. _general_composite_rect (fetch/combine/writeback; iterators) > > - asm implementation or > > - C implementation for each > > Yes, that's pretty much it, except that some platforms have multiple > levels of asm fast paths, and some or all of those will be enabled > depending upon the CPU features detected via a combination of compile- > time and runtime tests. > > Basically, there is a chain of pixman_implementation_t structs, in > decreasing priority order (that's why you'll see the name > "implementation" used to refer to a set of routines tuned for a > particular instruction set). Each implementation contains a table of > pixman_fast_path_t structs (which we refer to as "fast paths") and a > table of pixman_iter_info_t structs (the fetcher and writeback iterators) > and an array of combiner routines and a few other bits and pieces. > > For example, on an ARMv7 platform, you'll normally find the following > implementations are enabled, in decreasing priority order: > > pixman-noop.c (can't be disabled) > pixman-arm-neon.c (unless PIXMAN_DISABLE contains "arm-neon") > pixman-arm-simd.c (unless PIXMAN_DISABLE contains "arm-simd") > pixman-fast-path.c (unless PIXMAN_DISABLE contains "fast") > pixman-general.c (can't be disabled; also references last-resort > functions in pixman-bits-image.c / pixman-*-gradient.c / > pixman-combine32.c / pixman-combine-float.c) > > When you call pixman_image_composite(), it scans through the fast paths > from each implementation in order, looking for one which matches the > criteria in the fast path tables. pixman-general.c contains a single fast > path, which is universally applicable, and therefore handles anything > that wasn't caught by higher implementations - and it uses the function > general_composite_rect(). In turn, general_composite_rect scans the > implementations in order, looking for fetchers, combiners and writeback > function which will allow it to perform the requested operation line by > line, stage by stage. > > When you set PIXMAN_DISABLE, you knock out the whole of an > implementation, both its fast paths and its iterators/combiners. > > The point I was trying to make (badly, it seems) is that iterators/ > combiners are relatively widely applicable, and are chosen at lower > priority than all the fast paths, but because they were developed > relatively recently, many of the fast paths have never had their > performance compared against the iterators/combiners to see if their > inclusion is perhaps no longer warranted since the iterators/combiners > were added. Thank you for the excellent explanation. I'm going to bookmark this, in case anyone else asks. :-) > > Maybe we could fix that by introducing a PIXMAN_DISABLE=wholeop or > > similar, that would disable all whole operation fast paths, but leave > > the iterator paths untouched? > > > > Should I do that, would it be worth it? > > It could probably be done in _pixman_implementation_create(), as long as > _pixman_implementation_create_general() explicitly initialises > imp->fast_paths so that at least general_composite_rect() always ends up > on the chain of fast paths. Ok, I'll keep that in mind in case we want to test things. Thanks, pq
pgpoWqgjXSrlj.pgp
Description: OpenPGP digital signature
_______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman