Hi David,
>> so basically you want to find out how fast you can >> update blocks of 64 000 pixels > > I never said that... What I mean is that you changed ALL pixels to make sure to know what the frame rate in the WORST case is, when ALL 320 x 200 pixels actually need updates. >> You probably want to optimize that copying routine. > Not sure what that is. Some people do really heavy tweaking to speed up games: http://archive.gamedev.net/archive/reference/articles/article817.html Note that this is sort of outdated, as most game people now worry about optimum use of 3d graphics chipsets. >> In C, you can at least use logic calculations, to >> avoid having to do "if then" for every single pixel. > Not sure what you mean here. Roughly the following: If necessary, you first expand your transparency mask into a format which has one byte per pixel. Then you negate the mask to get a one byte per pixel mask of the opposite of transparency. Then you compute something like 1) new pixels = (old pixels & mask1) | (new pixels & mask2) Where mask1 and mask2 are the negated forms of each other. This even works for alpha masks: 2) new pixels = (old pixels "*" mask1) "+" (new pixels "*" mask2) The trick is that you can do all operations using data types which are big enough for SEVERAL PIXELS. It means you can calculate the updated values for for example FOUR pixels in ONE step. For the alpha mask version, this requires the ability to treat a 32 bit value as a vector of four 8 bit values. This is exactly what MMX does: Like a floating point coprocessor which is specialized on floating point calculations, MMX is a CPU component which is specialized on vectors :-) So the "*" and "+" must work on "bytes in a longer data type". A normal 386 "add" or "mul" would fail. Note that MMX uses 64 bit values and never stuff such as SSE uses even longer values, so you can do yet more pixels in parallel :-) The problem is that MMX, SSE and other things are often not well supported by compilers so you would have to manually write special code. HOWEVER, the first (non-alpha) variant which only has yes / no decisions works with ALL COMPILERS which have a 32 bit integer data type :-) Of course it only gives you the expected speed when the compiler knows how to use 32 bit integers efficiently on 386 and newer CPU. > Maybe I need to check again, but I'm pretty sure VGA RAM > is considered outside my allocated memory. In DJGPP, you can request a mapping of the VGA RAM to a normal pointer. Then you can use it as if it would be part of your allocated memory. Using macros for a low level global memory peek or poke is much slower. Here are some snippets from an old program of mine: #include <dpmi.h> /* stuff with __dpmi_... names */ #include <dos.h> /* int86, union REGS */ #include <pc.h> /* things like inportb() */ #include <go32.h> /* in case you want to access _dos_ds */ #include <sys/farptr.h> /* e.g. _farpeekb(_dos_ds or other, offset) */ __dpmi_meminfo memory_mapping; int lfbSel; memory_mapping.address = vesamode.lfbPTR; /* physical linear address */ memory_mapping.size = ( (vesamode.bytes_line * vesamode.height) + 65535) & (uint32)0xffff0000; /* round up to multiple of 64k */ For VGA, you would just say address=0xa0000, size=0x10000, obviously. __dpmi_physical_address_mapping(&memory_mapping); // fail if != 0 __dpmi_lock_linear_region(&memory_mapping); // for memory below 1 MB, this just made 1:1 mappings, // but you SHOULD use the LDT to stay more compatible: lfbSel = __dpmi_allocate_ldt_descriptors(1); /* alloc 1 slot */ __dpmi_set_segment_base_address(lfbSel, memory_mapping.address); __dpmi_set_segment_limit(lfbSel, memory_mapping.size - 1); Now you can use _farpokeb(lfbSel, offset, value) for single bytes, _farpokew(...) for units of 16 bit and _farpokel(...) for units of 32 bit. You can also use other "far" stuff, but I have to admit that the example still is a bit tedious, using far pointers. Of course there is also _farpeekb(selector, offs) and _farpeekw(...) and _farpeekl(...) for reading the memory. There also is good documentation about all this online :-) It sounds a bit complicated, but it is worth using in DJGPP. > "Optimizing" by using a better video mode that *might not > be supported by the hardware* is not a real answer. What I mean is: It is possible to use complicated VGA tricks to have multiple buffers and page flipping, but given how rare non-VESA hardware is, I would say it is sufficient to NOT try TOO hard to optimize for VGA and only "optimize a bit" for VGA. Because VESA is faster anyway and only a few users will suffer from slower performance when your program has to use VGA mode. Regards, Eric ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Freedos-devel mailing list Freedos-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/freedos-devel