Re: RENDER question

The Rasterman Fri, 29 Aug 2003 00:21:38 +0000
On Thu, 28 Aug 2003 12:47:26 +0200 Thomas Winischhofer <[EMAIL PROTECTED]>
(Bbabbled:
(B
(B> Carsten Haitzler (The Rasterman) wrote:
(B> > aaaaah shared framebuffer. ok. then i can understand some of the "hurt" :)
(B> > 
(B> > 
(B> >>What values do you get on your hardware? (Unscaled only, the rest is
(B> >>entirely depending on the CPU)
(B> > 
(B> > 
(B> > for the benchmarker:
(B> > *** ROUND 1 ***
(B> > ---------------------------------------------------------------
(B> > Test: Test Xrender doing non-scaled Over blends
(B> > Time: 12.445 sec.
(B> > ---------------------------------------------------------------
(B> > Test: Test Xrender (offscreen) doing non-scaled Over blends
(B> > Time: 10.056 sec.
(B> > ---------------------------------------------------------------
(B> > Test: Test Imlib2 doing non-scaled Over blends
(B> > Time: 0.332 sec.
(B> 
(B> That is strange. Without acceleration, I get
(B> 
(B> 9.7
(B> 1.7
(B> 2.3
(B> 
(B> Seems imlib uses the video RAM.
(B
(Bdefinitely not. imlib2 only uses system ram - all its buffers are a direct
(Bresult of malloc() :) imlib2 has no clue that video hardware exists :)
(B
(B> > (xrender doesnt have accel turned on there. if i turn it on it bats imlib 2
(B> > by 3-4 times. i cant get to that box right now... thanks to my isp being
(B> > screwed):)
(B> 
(B> That's about the same factor as I get here.
(B> 
(B> > i dont have the old code working anymore for the gl engine i had - i just
(B> > remember getting full screen 1600x1200 composities and scales going from
(B> > somewhere like 2 to 50+ fps once opengl got slid under the bonnet.
(B> 
(B> No GL for newer SiS chips, therefore I need to use the 2D engine.
(B
(Bok :)
(B
(B> >>I need to copy the texture to video RAM once, unless somebody tells me
(B> >>it already is there (I could use the 2D accelerator for this, too, then.
(B> >>In this case I just wonder why the mga driver doesn't do it this way.).
(B> > 
(B> > 
(B> > is this per composite? or just the first time it (the pixmap lets say) is
(B> > created?
(B> 
(B> Frankly, I don't know. I haven't looked into the composite function yet.
(B
(Bit might be worth examining :)
(B
(B> >>Since the accelerator does not sync after initiating the command, using
(B> >>the provided memory area is unsecure. The app might reuse it for
(B> >>something else before the command is actually executed. Syncing after
(B> >>the command is insane (because it could take forever, depending on the
(B> >>amount of commands already in the queue - and this queue is BIG)
(B> > 
(B> > 
(B> > hmmm do you have a way of knowing where the accelerator is up to?
(B> 
(B> Yes, I can check the queue location anytime. But doing this before every
(B> accelerator command slows down the whole stuff dramatically.
(B
(Bhmm- so theres no simple counter? maybe every N commands do a "sync" and find
(Bout where your'e up to (or ever N seconds - whichever comes first) ? :
(B
(B> > ie interrupts
(B> > etc? 
(B> 
(B> No interrupts.
(B
(B:(
(B
(B> >>It's a fast CPU with fast RAM, and a slow GPU with memory shared with
(B> >>the CPU. More can't be expected, I guess.
(B> > 
(B> > 2.3 seconds for a blend doesnt smell like a fast cpu :) my 1.7ghz athlon
(B> > gets the 1:1 blends done in 0.3 secs or so. so technically my desktop cpu
(B> > (ram/bus etc.) is still double the speed of your sis gfx chips :)
(B> 
(B> imlib probably handles this in video RAM. This is a 2.0Ghz P4. As of
(B> now, I still consider this quite a fast one...
(B
(Bnup. definitely not. all imlib2 ops are in system ram returned from malloc with
(Binline mmx asm for blending and scaling ops. scaling is capable of full
(Bsuper/sub sampling scaling up and down, but i've disabled supersampling when
(Bscaling down to match xrender's output.
(B
(B> >>Text drawing (x11perf -aa24text) went from 25000 to 105000, which is
(B> >>more than factor 4. I am satisfied. (Now, if I just could find out why
(B> >>the accelerator functions are not being called on my 4.3 system...)
(B> > 
(B> > thats good :) though i still like to compare x performance against external
(B> > code(ie like imlib2 - or use gdk-pixbuf, or anything else) and always try to
(B> > at least equal your "software rivals" :) i really want to see x accelerating
(B> > where hardware can and beating the PANTS off any software code :)
(B> 
(B> Up to certain degree, yes. However, if we want to keep people from
(B> screaming "X is bloated", we need to have some generic functions.
(B> imlib'n stuff might contain assembler/MMX routines for special
(B> situations, which will beat the generic X routines, of course. That will
(B> not change. (Not speaking about portability here.)
(B
(Bbut that doesn't mean x can't have mmx/sse/sse2/altivec routines to - it really
(Bisnt that much code. you only replace the core loop of the most common
(Boperations. comapred to x's current size - this is NOTHING. :)
(B
(B> Furthermore, using the accelerator for small tasks (eg blitting a glyph)
(B> won't be much faster than doing it be the CPU, since the engine setup is
(B> about the same amount of code like blitting eg. 64 bits into the
(B> framebuffer by the CPU.
(B
(Byup. this is where i'd imagine the accelerator going "test speed. setup time is
(BX run time is Y, test using software/cpu time is Z. if Z < (X + Y) then use
(Bsoftware" (and do this for some key sizes like "small" (8x8) "medium" (64x64)
(Band "large" (512x512) for example... :) this will put that logic in one place
(Band give it the best opportunity of making the most optimal decision because the
(Baccelerator knows more about the hardware than anyone else. also the tests are
(Bdone once on x startup :) well that's how i envisage the code :)
(B
(B> >>Hm, the background never looks like that gfx during render tests. I just
(B> >>see a rainbow-like gradient from top/left to bottom/right... (no matter
(B> >>whether with or without the accleration)
(B> 
(B> I'll send you a screen shot (per private mail)
(B> 
(B> >>That could be a problem. So far, I haven't found a suitable hook for
(B> >>this (and replacing the entire composite function seems a bit far
(B> >>fetched at the moment)
(B> > 
(B> > hmmm but likely you would be better wrapping it. special case the 1:1 (as
(B> > currently) on a per call basis (do it within the call) and detect certain
(B> > transforms (ie non rotation/skew ones) since scaling blitters often only do
(B> > simple pixel scaling - not full matrix transforms. (tho my knowledge of this
(B> > may be waaay out of date by now), and pump them through the acclerator :)
(B> 
(B> Since the composite uses as good as no internal hooks, it's either all
(B> or nothing. And this function does MUCH. I'll have a closer look on the
(B> weekend.
(B
(Bok cool. i'll admit i know very little of xrenders internal api server-side. i
(Bmay just be babbling on - i'm just commenting from my prior experience with
(Bdoing things like this (with hardware years ago and in software space).
(B
(B-- 
(B--------------- Codito, ergo sum - "I code, therefore I am" --------------------
(BThe Rasterman (Carsten Haitzler)    [EMAIL PROTECTED]
$B7'<*(B - $Bhttp://XFree86.Org/mailman/listinfo/devel
Re: RENDER question

Reply via email to