Hi David,

>> so basically you want to find out how fast you can
>> update blocks of 64 000 pixels
> 
> I never said that...

What I mean is that you changed ALL pixels to make
sure to know what the frame rate in the WORST case
is, when ALL 320 x 200 pixels actually need updates.

>> You probably want to optimize that copying routine.

> Not sure what that is.

Some people do really heavy tweaking to speed up games:

http://archive.gamedev.net/archive/reference/articles/article817.html

Note that this is sort of outdated, as most game people
now worry about optimum use of 3d graphics chipsets.

>> In C, you can at least use logic calculations, to
>> avoid having to do "if then" for every single pixel.

> Not sure what you mean here.

Roughly the following: If necessary, you first expand your
transparency mask into a format which has one byte per pixel.

Then you negate the mask to get a one byte per pixel mask of
the opposite of transparency. Then you compute something like

1) new pixels = (old pixels & mask1) | (new pixels & mask2)

Where mask1 and mask2 are the negated forms of each other.
This even works for alpha masks:

2) new pixels = (old pixels "*" mask1) "+" (new pixels "*" mask2)

The trick is that you can do all operations using data
types which are big enough for SEVERAL PIXELS. It means
you can calculate the updated values for for example
FOUR pixels in ONE step. For the alpha mask version,
this requires the ability to treat a 32 bit value as
a vector of four 8 bit values. This is exactly what
MMX does: Like a floating point coprocessor which is
specialized on floating point calculations, MMX is
a CPU component which is specialized on vectors :-)
So the "*" and "+" must work on "bytes in a longer
data type". A normal 386 "add" or "mul" would fail.

Note that MMX uses 64 bit values and never stuff such
as SSE uses even longer values, so you can do yet more
pixels in parallel :-) The problem is that MMX, SSE and
other things are often not well supported by compilers
so you would have to manually write special code.

HOWEVER, the first (non-alpha) variant which only has
yes / no decisions works with ALL COMPILERS which have
a 32 bit integer data type :-) Of course it only gives
you the expected speed when the compiler knows how to
use 32 bit integers efficiently on 386 and newer CPU.

> Maybe I need to check again, but I'm pretty sure VGA RAM
> is considered outside my allocated memory.

In DJGPP, you can request a mapping of the VGA RAM to
a normal pointer. Then you can use it as if it would
be part of your allocated memory. Using macros for a
low level global memory peek or poke is much slower.

Here are some snippets from an old program of mine:

#include <dpmi.h> /* stuff with __dpmi_... names */
#include <dos.h> /* int86, union REGS */
#include <pc.h> /* things like inportb() */
#include <go32.h> /* in case you want to access _dos_ds */
#include <sys/farptr.h> /* e.g. _farpeekb(_dos_ds or other, offset) */

__dpmi_meminfo memory_mapping;
int lfbSel;

memory_mapping.address = vesamode.lfbPTR; /* physical linear address */
memory_mapping.size = ( (vesamode.bytes_line * vesamode.height)
 + 65535) & (uint32)0xffff0000;    /* round up to multiple of 64k */

For VGA, you would just say address=0xa0000, size=0x10000, obviously.

__dpmi_physical_address_mapping(&memory_mapping); // fail if != 0
__dpmi_lock_linear_region(&memory_mapping);

// for memory below 1 MB, this just made 1:1 mappings,
// but you SHOULD use the LDT to stay more compatible:

lfbSel = __dpmi_allocate_ldt_descriptors(1); /* alloc 1 slot */
__dpmi_set_segment_base_address(lfbSel, memory_mapping.address);
__dpmi_set_segment_limit(lfbSel, memory_mapping.size - 1);

Now you can use _farpokeb(lfbSel, offset, value) for single
bytes, _farpokew(...) for units of 16 bit and _farpokel(...)
for units of 32 bit. You can also use other "far" stuff, but
I have to admit that the example still is a bit tedious, using
far pointers. Of course there is also _farpeekb(selector, offs)
and _farpeekw(...) and _farpeekl(...) for reading the memory.

There also is good documentation about all this online :-)
It sounds a bit complicated, but it is worth using in DJGPP.

> "Optimizing" by using a better video mode that *might not
> be supported by the hardware* is not a real answer.

What I mean is: It is possible to use complicated VGA
tricks to have multiple buffers and page flipping, but
given how rare non-VESA hardware is, I would say it is
sufficient to NOT try TOO hard to optimize for VGA and
only "optimize a bit" for VGA. Because VESA is faster
anyway and only a few users will suffer from slower
performance when your program has to use VGA mode.

Regards, Eric


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Freedos-devel mailing list
Freedos-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freedos-devel

Reply via email to