On Tue, 5 Dec 2017 22:50:21 -0300 Vinícius dos Santos Oliveira <vini.ipsma...@gmail.com> said:
> 2017-12-03 10:19 GMT-03:00 Carsten Haitzler <ras...@rasterman.com>: > > > also i dislike the idea of a jit. i would prefer something like > > "pre-compile > > all permutations into special case functions then runtime call the > > right/matching one". > > > > I'm not familiar implementing gfx engines, but with my (small) background > on compilers and the giant notes page[1], I assume there are optimizations > that can't just be implemented otherwise. > > The API is like: > > ctx.draw_something1(); > ctx.draw_something2(); > ctx.draw_something3(); > ctx.end(); //< here the pipeline is compiled generally (at least for us) there is very little to be gained optimizing the "between something1/2/3". jits are good for this when you have reaaaaaly small operations like cpu commands (a = b + 1 for example) and having a continuation of this where the same data stays in the same register etc. is a good idea and going to yield good speedups. for our graphics needs we generally do large batch operations like "do operation X with param a=x, b=y, c=z from rectangle X to rectangle Y" and X and Y rectangles are large regions of pixels (100's, 1000's or millions of them). trying to optimize for tiny operations on single pixels is a long tail of work that on average is going to yield very little in real life. this leads to the ability to pre-compile these operations. most of the time for params like a=0 you can special case 0, and 255 (or 1.0) and skip some muls or turn the op into a NOP entirely depending on what you are doing, but for all other values the generic math has to be done and you can do it with regular cpu (sometimes with some tricks up your sleeve to do simd-like ops with a regular cpu op as long as you don't need saturation math), or do some assembly version to do it. > In this API, the gfx engine can take info on cross-function information. If > draw_something3 completely hides draw_something2, one call can be > completely removed. It's actually boring, because you have to recompile the > whole pipeline every value changed, but the plans to support shaders will > change the situation[2][3]. > > There are drawbacks too. Evas already do interesting things and I just > don't know if this blend2d would actually be helpful. Evas is pretty damn > fast. And blend2d is whole CPU side, no GPU (but I still think it's > interesting because Bézier curves). sure. in some ways a jit is good. but pre-compiling is more generic and portable. but in the end the lack of architecture support for major architectures we support kind of puts it on the veto list. :( if it just generated some c src for all permutations and then we compiled it in... we'd at least have a portable one (without specialized assembly for mmx/sse, neon etc.). but it doesn't do this. :(. it does do mmx/sse etc. in the jit .. but only for intel and that is a problem. yes - they make good points about alignment cleanups for simd asm ops due to alignment needed and this expands library size. that is true. but it's not THAT onerous. they also make a good point of eliminating tmp buffers. we actually rarely use tmp buffers ourselves and have our operations already pre-coded, but we could do better. i was mulling doing a new sw renderer with tiles so yes - we'd have "temp buffers" but they'd all fit well inside l1 cache and that would be a big win when blending/operating over the same region multiple times as the destination at least will stay in cache for fast read-modify-write performance for blends. to do all of this i was mulling making a code generator to generate all the operations and functions from templates and so on... :) i have for a long time been mulling the idea of "texture compression" (s3tc, etc1/2 etc.style) even for software. allow more interesting pixel formats that reduce memory bandwidth needs significantly with constant storage cost. a sw engine can even be super smart and switch storage format tile by tile depending on what works best for that tile. often regions of an image are "empty" and thus there is no need to even store any pixels for it - just a flag of "this is transparent" or "this is a single ARGB color" and then when rendering these regions can optimize their src fetches significantly... and save memory to boot. > Oh, and thanks for replying to my email. It was just a different point of > view, just as I wanted to have. > > [1] https://blend2d.com/notes.html > [2] https://blend2d.com/roadmap.html yes. they say they want to support arm. but still there's ppc, mips, ... and more... :) there is no way i'd want to maintain 2 rendering engine paths in the long run and short term should be very short. i'd want a lot more architectures already done and finished and supported before even considering... > [3] https://gist.github.com/vinipsmaker/08349a74566df4c4a9bf82624c13a33b > > -- > Vinícius dos Santos Oliveira > https://vinipsmaker.github.io/ -- ------------- Codito, ergo sum - "I code, therefore I am" -------------- Carsten Haitzler - ras...@rasterman.com ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel