Re: [fpc-pascal] for loops performance problems?

2017-07-04 Thread Peter
I usually start performance investigations by compiling with '-al', and
looking at the generated assembler.

Regards,
Peter

P.S.  From what we know so far, inclined to agree with Charlie.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Implementing AggPas with PtcGraph

2017-07-04 Thread James Richters
I defined a static array to convert the mode 13h VGA palette to separate red, 
green and blue to send to aggpas as well as the packed RGB565 format needed to 
send to ptcgraph 16bit colors.

 

James

 

 

Type 

   VGARGBRec= Record

  R,G,B:Byte;

  RGB:Word;

   end;

 

Const

   VGA256: Array[0..255] of VGARGBRec = (

  (R:$00; G:$00; B:$00; RGB:$),

  (R:$00; G:$00; B:$AA; RGB:$0015),

  (R:$00; G:$AA; B:$00; RGB:$0540),

  (R:$00; G:$AA; B:$AA; RGB:$0555),

  (R:$AA; G:$00; B:$00; RGB:$A800),

  (R:$AA; G:$00; B:$AA; RGB:$A815),

  (R:$AA; G:$55; B:$00; RGB:$AAA0),

  (R:$AA; G:$AA; B:$AA; RGB:$AD55),

  (R:$55; G:$55; B:$55; RGB:$52AA),

  (R:$55; G:$55; B:$FF; RGB:$52BF),

  (R:$55; G:$FF; B:$55; RGB:$57EA),

  (R:$55; G:$FF; B:$FF; RGB:$57FF),

  (R:$FF; G:$55; B:$55; RGB:$FAAA),

  (R:$FF; G:$55; B:$FF; RGB:$FABF),

  (R:$FF; G:$FF; B:$55; RGB:$FFEA),

  (R:$FF; G:$FF; B:$FF; RGB:$),

  (R:$00; G:$00; B:$00; RGB:$),

  (R:$14; G:$14; B:$14; RGB:$10A2),

  (R:$20; G:$20; B:$20; RGB:$2104),

  (R:$2C; G:$2C; B:$2C; RGB:$2965),

  (R:$38; G:$38; B:$38; RGB:$39C7),

  (R:$44; G:$44; B:$44; RGB:$4228),

  (R:$50; G:$50; B:$50; RGB:$528A),

  (R:$61; G:$61; B:$61; RGB:$630C),

  (R:$71; G:$71; B:$71; RGB:$738E),

  (R:$81; G:$81; B:$81; RGB:$8410),

  (R:$91; G:$91; B:$91; RGB:$9492),

  (R:$A1; G:$A1; B:$A1; RGB:$A514),

  (R:$B6; G:$B6; B:$B6; RGB:$B5B6),

  (R:$CA; G:$CA; B:$CA; RGB:$CE59),

  (R:$E2; G:$E2; B:$E2; RGB:$E71C),

  (R:$FF; G:$FF; B:$FF; RGB:$),

  (R:$00; G:$00; B:$FF; RGB:$001F),

  (R:$40; G:$00; B:$FF; RGB:$401F),

  (R:$7D; G:$00; B:$FF; RGB:$781F),

  (R:$BE; G:$00; B:$FF; RGB:$B81F),

  (R:$FF; G:$00; B:$FF; RGB:$F81F),

  (R:$FF; G:$00; B:$BE; RGB:$F817),

  (R:$FF; G:$00; B:$7D; RGB:$F80F),

  (R:$FF; G:$00; B:$40; RGB:$F808),

  (R:$FF; G:$00; B:$00; RGB:$F800),

  (R:$FF; G:$40; B:$00; RGB:$FA00),

  (R:$FF; G:$7D; B:$00; RGB:$FBE0),

  (R:$FF; G:$BE; B:$00; RGB:$FDE0),

  (R:$FF; G:$FF; B:$00; RGB:$FFE0),

  (R:$BE; G:$FF; B:$00; RGB:$BFE0),

  (R:$7D; G:$FF; B:$00; RGB:$7FE0),

  (R:$40; G:$FF; B:$00; RGB:$47E0),

  (R:$00; G:$FF; B:$00; RGB:$07E0),

  (R:$00; G:$FF; B:$40; RGB:$07E8),

  (R:$00; G:$FF; B:$7D; RGB:$07EF),

  (R:$00; G:$FF; B:$BE; RGB:$07F7),

  (R:$00; G:$FF; B:$FF; RGB:$07FF),

  (R:$00; G:$BE; B:$FF; RGB:$05FF),

  (R:$00; G:$7D; B:$FF; RGB:$03FF),

  (R:$00; G:$40; B:$FF; RGB:$021F),

  (R:$7D; G:$7D; B:$FF; RGB:$7BFF),

  (R:$9D; G:$7D; B:$FF; RGB:$9BFF),

  (R:$BE; G:$7D; B:$FF; RGB:$BBFF),

  (R:$DE; G:$7D; B:$FF; RGB:$DBFF),

  (R:$FF; G:$7D; B:$FF; RGB:$FBFF),

  (R:$FF; G:$7D; B:$DE; RGB:$FBFB),

  (R:$FF; G:$7D; B:$BE; RGB:$FBF7),

  (R:$FF; G:$7D; B:$9D; RGB:$FBF3),

  (R:$FF; G:$7D; B:$7D; RGB:$FBEF),

  (R:$FF; G:$9D; B:$7D; RGB:$FCEF),

  (R:$FF; G:$BE; B:$7D; RGB:$FDEF),

  (R:$FF; G:$DE; B:$7D; RGB:$FEEF),

  (R:$FF; G:$FF; B:$7D; RGB:$FFEF),

  (R:$DE; G:$FF; B:$7D; RGB:$DFEF),

  (R:$BE; G:$FF; B:$7D; RGB:$BFEF),

  (R:$9D; G:$FF; B:$7D; RGB:$9FEF),

  (R:$7D; G:$FF; B:$7D; RGB:$7FEF),

  (R:$7D; G:$FF; B:$9D; RGB:$7FF3),

  (R:$7D; G:$FF; B:$BE; RGB:$7FF7),

  (R:$7D; G:$FF; B:$DE; RGB:$7FFB),

  (R:$7D; G:$FF; B:$FF; RGB:$7FFF),

  (R:$7D; G:$DE; B:$FF; RGB:$7EFF),

  (R:$7D; G:$BE; B:$FF; RGB:$7DFF),

  (R:$7D; G:$9D; B:$FF; RGB:$7CFF),

  (R:$B6; G:$B6; B:$FF; RGB:$B5BF),

  (R:$C6; G:$B6; B:$FF; RGB:$C5BF),

  (R:$DA; G:$B6; B:$FF; RGB:$DDBF),

  (R:$EA; G:$B6; B:$FF; RGB:$EDBF),

  (R:$FF; G:$B6; B:$FF; RGB:$FDBF),

  (R:$FF; G:$B6; B:$EA; RGB:$FDBD),

  (R:$FF; G:$B6; B:$DA; RGB:$FDBB),

  (R:$FF; G:$B6; B:$C6; RGB:$FDB8),

  (R:$FF; G:$B6; B:$B6; RGB:$FDB6),

  (R:$FF; G:$C6; B:$B6; RGB:$FE36),

  (R:$FF; G:$DA; B:$B6; RGB:$FED6),

  (R:$FF; G:$EA; B:$B6; RGB:$FF56),

  (R:$FF; G:$FF; B:$B6; RGB:$FFF6),

  (R:$EA; G:$FF; B:$B6; RGB:$EFF6),

  (R:$DA; G:$FF; B:$B6; RGB:$DFF6),

  (R:$C6; G:$FF; B:$B6; RGB:$C7F6),

  (R:$B6; G:$FF; B:$B6; RGB:$B7F6),

  (R:$B6; G:$FF; B:$C6; RGB:$B7F8),

  (R:$B6; G:$FF; B:$DA; RGB:$B7FB),

  (R:$B6; G:$FF; B:$EA; RGB:$B7FD),

  (R:$B6; G:$FF; B:$FF; RGB:$B7FF),

  (R:$B6; G:$EA; B:$FF; RGB:$B75F),

  (R:$B6; G:$DA; B:$FF; RGB:$B6DF),

  (R:$B6; G:$C6; B:$FF; RGB:$B63F),

  (R:$00; G:$00; B:$71; RGB:$000E),

  (R:$1C; G:$00; B:$71; RGB:$180E),

  (R:$38; G:$00; B:$71; RGB:$380E),

  (R:$55; G:$00; B:$71; RGB:$500E),

  (R:$71; G:$00; B:$71; RGB:$700E),

  (R:$71; G:$00; B:$55; RGB:$700A),

  (R:$71; G:$00; B:$38; RGB:$7007),

  (R:$71; G:$00; B:$1C; RGB:$7003),

  (R:$71; G:$00; B:$00; RGB:$7000),

  (R:$71; G:$1C; B:$00; 

Re: [fpc-pascal] for loops performance problems?

2017-07-04 Thread José Mejuto

El 04/07/2017 a las 11:09, Anthony Walter escribió:

I can convert to static buffers and get good performance (if I know the 
text isn't changing), but I'm now curious if this specific performance 
issue is related to fpc's for loop code generation.

What do you think?


Hello,

AFAIK the problem was/is some floating point maths not loops, and the 
partial/full SSA missing in fpc.


--

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] for loops performance problems?

2017-07-04 Thread Karoly Balogh (Charlie/SGR)
Hi,

On Tue, 4 Jul 2017, Anthony Walter wrote:

> I think the code to generate the geometry each frame isn't that complex,
> and I pre-allocate room in my buffer for all the geometry just once, so
> it seems doing to calculations for the geometry is what's killing the
> performance. The calculations are simple multiplication of "Single"
> type, and I am thinking maybe the "for looping" part is what's degrading
> performance. 
>
> Here is the gist of the loop that generates the text vertex buffer:
>
> https://gist.github.com/sysrpl/8af6e5a9d62cc2f2a1c40f9a9ae13b64

Well, first, please provide a compilable and runnable example for further
investigation.

> I can convert to static buffers and get good performance (if I know the
> text isn't changing), but I'm now curious if this specific performance
> issue is related to fpc's for loop code generation. 

No, it's probably the fact that you're doing 10 function calls per glyph
setup in the "World." part of your for loop, each involving their own set
of register/save restore, etc. I'd say that's probably much slower than
any performance degratation which might arise from the fact that fpc
doesn't do SSA in for loops.

But because the example you provided is not compilable, I cannot give
further hints, and the above is just speculation.

Charlie___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

[fpc-pascal] for loops performance problems?

2017-07-04 Thread Anthony Walter
I recall earlier this year some people in this mailing list were discussing
surprising performance problems with fpc and for loops. I wanted to know if
this is still an existing problem as I am experiencing some unusual
performance degradation related to a for loop in one of my test
applications.

Here is a description of my test application:

http://cache.getlazarus.org/videos/fonts.mp4 (vsync on for recording
purposes)

An opengl window which renders example text of various fonts. The user can
press a key to cycle through the available fonts to see how they look as
textured billboard sprites. The text displays in a few paragraphs.

The performance issue:

Adding a paragraph of sample text greatly reduces the opengl frame rate. On
some systems, like the raspberry pi, the frame rate can drop to 10 frames a
second. This seems like a bit much of a low frame rate given that it's
actually not a lot of geometry (4 vert or colors per character).

When I turn on geometry buffering, that is storing the vertex information,
then drawing using a user memory vertex buffer, the frame rate skyrockets
to 200+ fps (vsync is off) on a raspberry.

I think the code to generate the geometry each frame isn't that complex,
and I pre-allocate room in my buffer for all the geometry just once, so it
seems doing to calculations for the geometry is what's killing the
performance. The calculations are simple multiplication of "Single" type,
and I am thinking maybe the "for looping" part is what's degrading
performance.

Here is the gist of the loop that generates the text vertex buffer:

https://gist.github.com/sysrpl/8af6e5a9d62cc2f2a1c40f9a9ae13b64

I can convert to static buffers and get good performance (if I know the
text isn't changing), but I'm now curious if this specific performance
issue is related to fpc's for loop code generation.

What do you think?
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal