Re: [fpc-pascal] for loops performance problems?
I usually start performance investigations by compiling with '-al', and looking at the generated assembler. Regards, Peter P.S. From what we know so far, inclined to agree with Charlie. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Implementing AggPas with PtcGraph
I defined a static array to convert the mode 13h VGA palette to separate red, green and blue to send to aggpas as well as the packed RGB565 format needed to send to ptcgraph 16bit colors. James Type VGARGBRec= Record R,G,B:Byte; RGB:Word; end; Const VGA256: Array[0..255] of VGARGBRec = ( (R:$00; G:$00; B:$00; RGB:$), (R:$00; G:$00; B:$AA; RGB:$0015), (R:$00; G:$AA; B:$00; RGB:$0540), (R:$00; G:$AA; B:$AA; RGB:$0555), (R:$AA; G:$00; B:$00; RGB:$A800), (R:$AA; G:$00; B:$AA; RGB:$A815), (R:$AA; G:$55; B:$00; RGB:$AAA0), (R:$AA; G:$AA; B:$AA; RGB:$AD55), (R:$55; G:$55; B:$55; RGB:$52AA), (R:$55; G:$55; B:$FF; RGB:$52BF), (R:$55; G:$FF; B:$55; RGB:$57EA), (R:$55; G:$FF; B:$FF; RGB:$57FF), (R:$FF; G:$55; B:$55; RGB:$FAAA), (R:$FF; G:$55; B:$FF; RGB:$FABF), (R:$FF; G:$FF; B:$55; RGB:$FFEA), (R:$FF; G:$FF; B:$FF; RGB:$), (R:$00; G:$00; B:$00; RGB:$), (R:$14; G:$14; B:$14; RGB:$10A2), (R:$20; G:$20; B:$20; RGB:$2104), (R:$2C; G:$2C; B:$2C; RGB:$2965), (R:$38; G:$38; B:$38; RGB:$39C7), (R:$44; G:$44; B:$44; RGB:$4228), (R:$50; G:$50; B:$50; RGB:$528A), (R:$61; G:$61; B:$61; RGB:$630C), (R:$71; G:$71; B:$71; RGB:$738E), (R:$81; G:$81; B:$81; RGB:$8410), (R:$91; G:$91; B:$91; RGB:$9492), (R:$A1; G:$A1; B:$A1; RGB:$A514), (R:$B6; G:$B6; B:$B6; RGB:$B5B6), (R:$CA; G:$CA; B:$CA; RGB:$CE59), (R:$E2; G:$E2; B:$E2; RGB:$E71C), (R:$FF; G:$FF; B:$FF; RGB:$), (R:$00; G:$00; B:$FF; RGB:$001F), (R:$40; G:$00; B:$FF; RGB:$401F), (R:$7D; G:$00; B:$FF; RGB:$781F), (R:$BE; G:$00; B:$FF; RGB:$B81F), (R:$FF; G:$00; B:$FF; RGB:$F81F), (R:$FF; G:$00; B:$BE; RGB:$F817), (R:$FF; G:$00; B:$7D; RGB:$F80F), (R:$FF; G:$00; B:$40; RGB:$F808), (R:$FF; G:$00; B:$00; RGB:$F800), (R:$FF; G:$40; B:$00; RGB:$FA00), (R:$FF; G:$7D; B:$00; RGB:$FBE0), (R:$FF; G:$BE; B:$00; RGB:$FDE0), (R:$FF; G:$FF; B:$00; RGB:$FFE0), (R:$BE; G:$FF; B:$00; RGB:$BFE0), (R:$7D; G:$FF; B:$00; RGB:$7FE0), (R:$40; G:$FF; B:$00; RGB:$47E0), (R:$00; G:$FF; B:$00; RGB:$07E0), (R:$00; G:$FF; B:$40; RGB:$07E8), (R:$00; G:$FF; B:$7D; RGB:$07EF), (R:$00; G:$FF; B:$BE; RGB:$07F7), (R:$00; G:$FF; B:$FF; RGB:$07FF), (R:$00; G:$BE; B:$FF; RGB:$05FF), (R:$00; G:$7D; B:$FF; RGB:$03FF), (R:$00; G:$40; B:$FF; RGB:$021F), (R:$7D; G:$7D; B:$FF; RGB:$7BFF), (R:$9D; G:$7D; B:$FF; RGB:$9BFF), (R:$BE; G:$7D; B:$FF; RGB:$BBFF), (R:$DE; G:$7D; B:$FF; RGB:$DBFF), (R:$FF; G:$7D; B:$FF; RGB:$FBFF), (R:$FF; G:$7D; B:$DE; RGB:$FBFB), (R:$FF; G:$7D; B:$BE; RGB:$FBF7), (R:$FF; G:$7D; B:$9D; RGB:$FBF3), (R:$FF; G:$7D; B:$7D; RGB:$FBEF), (R:$FF; G:$9D; B:$7D; RGB:$FCEF), (R:$FF; G:$BE; B:$7D; RGB:$FDEF), (R:$FF; G:$DE; B:$7D; RGB:$FEEF), (R:$FF; G:$FF; B:$7D; RGB:$FFEF), (R:$DE; G:$FF; B:$7D; RGB:$DFEF), (R:$BE; G:$FF; B:$7D; RGB:$BFEF), (R:$9D; G:$FF; B:$7D; RGB:$9FEF), (R:$7D; G:$FF; B:$7D; RGB:$7FEF), (R:$7D; G:$FF; B:$9D; RGB:$7FF3), (R:$7D; G:$FF; B:$BE; RGB:$7FF7), (R:$7D; G:$FF; B:$DE; RGB:$7FFB), (R:$7D; G:$FF; B:$FF; RGB:$7FFF), (R:$7D; G:$DE; B:$FF; RGB:$7EFF), (R:$7D; G:$BE; B:$FF; RGB:$7DFF), (R:$7D; G:$9D; B:$FF; RGB:$7CFF), (R:$B6; G:$B6; B:$FF; RGB:$B5BF), (R:$C6; G:$B6; B:$FF; RGB:$C5BF), (R:$DA; G:$B6; B:$FF; RGB:$DDBF), (R:$EA; G:$B6; B:$FF; RGB:$EDBF), (R:$FF; G:$B6; B:$FF; RGB:$FDBF), (R:$FF; G:$B6; B:$EA; RGB:$FDBD), (R:$FF; G:$B6; B:$DA; RGB:$FDBB), (R:$FF; G:$B6; B:$C6; RGB:$FDB8), (R:$FF; G:$B6; B:$B6; RGB:$FDB6), (R:$FF; G:$C6; B:$B6; RGB:$FE36), (R:$FF; G:$DA; B:$B6; RGB:$FED6), (R:$FF; G:$EA; B:$B6; RGB:$FF56), (R:$FF; G:$FF; B:$B6; RGB:$FFF6), (R:$EA; G:$FF; B:$B6; RGB:$EFF6), (R:$DA; G:$FF; B:$B6; RGB:$DFF6), (R:$C6; G:$FF; B:$B6; RGB:$C7F6), (R:$B6; G:$FF; B:$B6; RGB:$B7F6), (R:$B6; G:$FF; B:$C6; RGB:$B7F8), (R:$B6; G:$FF; B:$DA; RGB:$B7FB), (R:$B6; G:$FF; B:$EA; RGB:$B7FD), (R:$B6; G:$FF; B:$FF; RGB:$B7FF), (R:$B6; G:$EA; B:$FF; RGB:$B75F), (R:$B6; G:$DA; B:$FF; RGB:$B6DF), (R:$B6; G:$C6; B:$FF; RGB:$B63F), (R:$00; G:$00; B:$71; RGB:$000E), (R:$1C; G:$00; B:$71; RGB:$180E), (R:$38; G:$00; B:$71; RGB:$380E), (R:$55; G:$00; B:$71; RGB:$500E), (R:$71; G:$00; B:$71; RGB:$700E), (R:$71; G:$00; B:$55; RGB:$700A), (R:$71; G:$00; B:$38; RGB:$7007), (R:$71; G:$00; B:$1C; RGB:$7003), (R:$71; G:$00; B:$00; RGB:$7000), (R:$71; G:$1C; B:$00;
Re: [fpc-pascal] for loops performance problems?
El 04/07/2017 a las 11:09, Anthony Walter escribió: I can convert to static buffers and get good performance (if I know the text isn't changing), but I'm now curious if this specific performance issue is related to fpc's for loop code generation. What do you think? Hello, AFAIK the problem was/is some floating point maths not loops, and the partial/full SSA missing in fpc. -- ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] for loops performance problems?
Hi, On Tue, 4 Jul 2017, Anthony Walter wrote: > I think the code to generate the geometry each frame isn't that complex, > and I pre-allocate room in my buffer for all the geometry just once, so > it seems doing to calculations for the geometry is what's killing the > performance. The calculations are simple multiplication of "Single" > type, and I am thinking maybe the "for looping" part is what's degrading > performance. > > Here is the gist of the loop that generates the text vertex buffer: > > https://gist.github.com/sysrpl/8af6e5a9d62cc2f2a1c40f9a9ae13b64 Well, first, please provide a compilable and runnable example for further investigation. > I can convert to static buffers and get good performance (if I know the > text isn't changing), but I'm now curious if this specific performance > issue is related to fpc's for loop code generation. No, it's probably the fact that you're doing 10 function calls per glyph setup in the "World." part of your for loop, each involving their own set of register/save restore, etc. I'd say that's probably much slower than any performance degratation which might arise from the fact that fpc doesn't do SSA in for loops. But because the example you provided is not compilable, I cannot give further hints, and the above is just speculation. Charlie___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
[fpc-pascal] for loops performance problems?
I recall earlier this year some people in this mailing list were discussing surprising performance problems with fpc and for loops. I wanted to know if this is still an existing problem as I am experiencing some unusual performance degradation related to a for loop in one of my test applications. Here is a description of my test application: http://cache.getlazarus.org/videos/fonts.mp4 (vsync on for recording purposes) An opengl window which renders example text of various fonts. The user can press a key to cycle through the available fonts to see how they look as textured billboard sprites. The text displays in a few paragraphs. The performance issue: Adding a paragraph of sample text greatly reduces the opengl frame rate. On some systems, like the raspberry pi, the frame rate can drop to 10 frames a second. This seems like a bit much of a low frame rate given that it's actually not a lot of geometry (4 vert or colors per character). When I turn on geometry buffering, that is storing the vertex information, then drawing using a user memory vertex buffer, the frame rate skyrockets to 200+ fps (vsync is off) on a raspberry. I think the code to generate the geometry each frame isn't that complex, and I pre-allocate room in my buffer for all the geometry just once, so it seems doing to calculations for the geometry is what's killing the performance. The calculations are simple multiplication of "Single" type, and I am thinking maybe the "for looping" part is what's degrading performance. Here is the gist of the loop that generates the text vertex buffer: https://gist.github.com/sysrpl/8af6e5a9d62cc2f2a1c40f9a9ae13b64 I can convert to static buffers and get good performance (if I know the text isn't changing), but I'm now curious if this specific performance issue is related to fpc's for loop code generation. What do you think? ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal