>That sounds like a little bit of a special case - it'll work where you're 
>using putimage for a large area, that has very few pixels set. 

 

That is exactly what I have almost all the time.  I’m wanting to use putimage 
for the entire screen all the time, but very few pixels on the screen change at 
any given screen update.  I have already tried only using putimage for part of 
the screen, and that helps sometimes, but most of the time it doesn’t help 
enough, because I’ll be drawing a long diagonal line, or a big ellipse and to 
encase the entity in a rectangular shape ends using a good portion of the 
screen, and by the time I get done calculating how to make the only slightly 
smaller area, I might as well just did putimage on the entire screen.

 

>Perhaps just reimplementing the general algorithm in inline asm, by using SSE 
>(or MMX) vector instructions would be the fastest

 

That sounds completely over my head 😊 

 

>but maybe it's not worth the pain

 

Maybe not…. I’m pretty sure I can handle processing the second array, but how 
and where to create it in aggpas, that I have no idea… yet… I have not actually 
tried to see how aggpas puts data in the buffer yet… It’s a huge package and 
I’m not sure what unit it’s even in.

 

>and the pascal implementation is fast enough for you.

 

It’s not quite fast enough yet…  

 

>Just experiment and see what works best :)

 

Sounds like fun 😊   Maybe I’ll do a test and pre-build the second array myself, 
just to see if there is any real benefit to this whole idea and if there is 
then I’ll try to figure out how to do it with aggpas

 

>Btw, I looked at your code again and saw a quick and cheap optimization - just 
>move the case statement (case BitBlt of) outside the inner loop (for i:=X to 
>X1 do), so the value of BitBlt is not checked once every pixel, but once per 
>row.



Great Idea, I took it one step further, wanting it to be as fast as possible 
and only check BitBlt once for the entire nested loop.  I also made a combined 
procedure for both 8bpp and 16bpp.  This is about 7% faster. You can see it 
here:

https://github.com/Zaaphod/ptcpas/compare/Avoid_ReChecking_Bitblt#diff-fb31461e009ff29fda5c35c5115978b4

 

 

>Try rearranging that like this:

>..

>Code

>..

>Note that all array calculation and the case is removed from the inner most 
>loop, at the expense of duplicating the for loop. 

>The index is not used in the for loop and made 0 based to allow the tighest 
>FOR loop code generation.

 

I also tried this, but for some strange reason it’s slower.. clocking in at 
1.773s for my 1000x loop instead of 1.056s maybe I did something wrong.  Here 
is what I did:

https://github.com/Zaaphod/ptcpas/compare/Restructure_PutImage_Loop#diff-fb31461e009ff29fda5c35c5115978b4

maybe the two  inc(pdest); inc(psrc); inside the inner loop are slower than the 
inc(k)?

 

 

James

 

 





_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to