On Tue, 5 Dec 2017 22:50:21 -0300 Vinícius dos Santos Oliveira
<vini.ipsma...@gmail.com> said:

> 2017-12-03 10:19 GMT-03:00 Carsten Haitzler <ras...@rasterman.com>:
> 
> > also i dislike the idea of a jit. i would prefer something like
> > "pre-compile
> > all permutations into special case functions then runtime call the
> > right/matching one".
> >
> 
> I'm not familiar implementing gfx engines, but with my (small) background
> on compilers and the giant notes page[1], I assume there are optimizations
> that can't just be implemented otherwise.
> 
> The API is like:
> 
> ctx.draw_something1();
> ctx.draw_something2();
> ctx.draw_something3();
> ctx.end(); //< here the pipeline is compiled

generally (at least for us) there is very little to be gained optimizing the
"between something1/2/3". jits are good for this when you have reaaaaaly small
operations like cpu commands (a = b + 1 for example) and having a continuation
of this where the same data stays in the same register etc. is a good idea and
going to yield good speedups.

for our graphics needs we generally do large batch operations like "do
operation X with param a=x, b=y, c=z from rectangle X to rectangle Y" and X and
Y rectangles are large regions of pixels (100's, 1000's or millions of them).
trying to optimize for tiny operations on single pixels is a long tail of work
that on average is going to yield very little in real life.

this leads to the ability to pre-compile these operations. most of the time for
params like a=0 you can special case 0, and 255 (or 1.0) and skip some muls or
turn the op into a NOP entirely depending on what you are doing, but for all
other values the generic math has to be done and you can do it with regular cpu
(sometimes with some tricks up your sleeve to do simd-like ops with a regular
cpu op as long as you don't need saturation math), or do some assembly version
to do it.

> In this API, the gfx engine can take info on cross-function information. If
> draw_something3 completely hides draw_something2, one call can be
> completely removed. It's actually boring, because you have to recompile the
> whole pipeline every value changed, but the plans to support shaders will
> change the situation[2][3].
> 
> There are drawbacks too. Evas already do interesting things and I just
> don't know if this blend2d would actually be helpful. Evas is pretty damn
> fast. And blend2d is whole CPU side, no GPU (but I still think it's
> interesting because Bézier curves).

sure. in some ways a jit is good. but pre-compiling is more generic and
portable. but in the end the lack of architecture support for major
architectures we support kind of puts it on the veto list. :( if it just
generated some c src for all permutations and then we compiled it in... we'd at
least have a portable one (without specialized assembly for mmx/sse, neon
etc.). but it doesn't do this. :(. it does do mmx/sse etc. in the jit .. but
only for intel and that is a problem.

yes - they make good points about alignment cleanups for simd asm ops due to
alignment needed and this expands library size. that is true. but it's not THAT
onerous. they also make a good point of eliminating tmp buffers. we actually
rarely use tmp buffers ourselves and have our operations already pre-coded, but
we could do better. i was mulling doing a new sw renderer with tiles so yes -
we'd have "temp buffers" but they'd all fit well inside l1 cache and that would
be a big win when blending/operating over the same region multiple times as the
destination at least will stay in cache for fast read-modify-write performance
for blends. to do all of this i was mulling making a code generator to generate
all the operations and functions from templates and so on... :) i have for a
long time been mulling the idea of "texture compression" (s3tc, etc1/2
etc.style) even for software. allow more interesting pixel formats that reduce
memory bandwidth needs significantly with constant storage cost. a sw engine
can even be super smart and switch storage format tile by tile depending on
what works best for that tile. often regions of an image are "empty" and thus
there is no need to even store any pixels for it - just a flag of "this is
transparent" or "this is a single ARGB color" and then when rendering these
regions can optimize their src fetches significantly... and save memory to boot.

> Oh, and thanks for replying to my email. It was just a different point of
> view, just as I wanted to have.
> 
> [1] https://blend2d.com/notes.html
> [2] https://blend2d.com/roadmap.html

yes. they say they want to support arm. but still there's ppc, mips, ... and
more... :) there is no way i'd want to maintain 2 rendering engine paths in the
long run and short term should be very short. i'd want a lot more architectures
already done and finished and supported before even considering...

> [3] https://gist.github.com/vinipsmaker/08349a74566df4c4a9bf82624c13a33b
> 
> -- 
> Vinícius dos Santos Oliveira
> https://vinipsmaker.github.io/


-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
Carsten Haitzler - ras...@rasterman.com


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Reply via email to