Hi list,

On at 2022-07-12 16:29 +0000, Bret Johnson wrote:
For TSR's, there are additional things you can do to reduce memory.  You can 
look at the source code for my PRTSCR program (available at 
http://bretjohnson.us) that uses a BUNCH of tricks.  For example, it doesn't 
even use the DOS TSR interrupt.

The way it works is to make a "copy" of itself at the top of conventional memory, terminates itself (using a normal DOS 
terminate process, which includes deleting the original PSP), and then continues running from the "copy".  The 
"copy" decides where the best place in memory is to load the TSR (which can even be in one or more "memory 
holes" left by some other program or in upper memory), allocates appropriate memory block(s), and then installs itself in 
the allocated memory.  I learned that technique from ECM a long time ago.  It's much more complicated than a "normal" 
TSR installation, but is much more efficient in terms of ultimate memory use.

I developed this optimal installation handling first for RxANSI, based on Henrik Haftmann's ANSI, which I forked starting in 2008 [1]. I eventually adapted it into the TSR example [2] and then also for some other TSRs such as lClock [3].

To note is that the resident block installed this way doesn't have a PSP at all, just an MCB with itself as the owner and a DOS v4+ style MCB name for MEM type programs.


Other notes on your TSR and application:

Your GRAFTABL uses interrupt 21h service 31h to stay resident. You do free the environment, but to be on the safe side it is better to also clear the process's environment field to zero. Example code is in my TSR example [4]:

        xor ax, ax
        xchg ax, word [ cs:2Ch ]; set PSP field to zero
        mov es, ax
        mov ah, 49h
        int 21h                 ; Free our environment


Further, you calculate the size of the PSP block to keep resident by using test, add, and shifts:

          mov       dx, trans           ; Allow everything preceding the
test dx, 0x000F ; transient portion of GRAFTABL to
          jz        .20                 ;   remain resident. (Rounded up to
add dx, 0x10 ; the next paragraph, because MS-DOS
.20:      mov       cl, 4               ;   wants the size in paragraphs.)
          shr       dx, cl
          mov       ax, 0x3100
          int       0x21                ; TSR EXIT CODE 0

It is more efficient to change this like so:

        mov dx, trans
        add dx, 15
        mov cl, 4
        shr dx, cl

The "add dx, 15" makes the shr round up in its division. (I expect that the addition won't overflow here.) However, you can easily change it so that the addition is done by the assembler at build time, using "mov dx, trans + 15".

Moreover, while it is not supported in the most obvious way (like "mov dx, (trans + 15) / 16") you can teach NASM to do the entire calculation (shift or division and all) by calculating a scalar length of the program (as opposed to a relocatable symbol like created by your "trans:" label). There's an example of this in the DVORAK TSR (under GNU GPL v2+) by Donald Bindner [5] that goes as follows:


even 16
end_of_resident equ ($-$$ + 0100h)
...
    mov dx, end_of_resident/16      ; number of paragraphs to keep
    mov ax, 3100h                   ; terminate resident w/ 0 return code
    int 21h

As you can see, NASM allows to divide the scalar value. (Rounding up is not needed in this particular calculation because they already aligned the position of the end_of_resident equate to a 16-byte boundary.)

Here is another example [6], in FDAPM (by Eric Auer) which I extended with some but not all of my TSR ideas:

        mov dx, (eofTSR - start + 256 + 15) >> 4
                        ; +256 for PSP, start is at offset 100h
        mov ax,3101h    ; go TSR, errorlevel 1
        int 21h


In my TSRs and other applications I use more sophisticated calculations, some much more so. These are often based on different sections or segments of the program. I generally calculate deltas to address different parts and hardcode resulting values into the program at build time. However, the suboptimal way of calculating the amount of paragraphs at run time is a very common oversight.


Another problem (which is also little known) is that your use of interrupt 21h service 31h will retain all of your currently open process handles, as well as the entire system's System File Table (SFT) entries associated with these. This is not a problem by default because all your handles will be DUPlicated from the parent's, for your stdin, stdout, stderr, stdaux, and stdprn. That means they will share the same SFT entries as already used by the shell. However, if the user runs your program with output redirection (either to a file or a character device such as NUL), as in "graftabl > nul", then you will leak the SFT entry which was reserved for your process to use.

I assume that the people involved in the design of this DOS service expected that TSRs would generally want to keep around their PSPs, so that they could swap processes and then use their own handles as preserved in their process handle tables. However, in practice most TSRs never re-use their PSPs after the DOS TSR termination handling is done. So in that case, as it is for your application, you should explicitly free all handles before terminating.

Here's how I solved it [7] in FDAPM (and with equivalent code in FreeDOS SHARE):

        xor bx, bx              ; = 0
        mov cx, word [32h]      ; get amount of handles
.loop:
        mov ah, 3Eh
        int 21h                 ; close it
        inc bx                  ; next handle
        loop .loop              ; loop for all process handles -->


Further, you can certainly re-use the "zero page" (PSP) space starting at offset 80h, which holds the command line tail by default but is also the default DTA for your process. That latter fact is a big hint that DOS doesn't need this buffer to be preserved. Even more space at the tail of the PSP can be re-used. DOS enforces a minimum resident size for the service 31h PSP allocation of 60h bytes [8], and the only known uses of the space after that is for the default unopened FCBs [9]. Even the additional 16 bytes down to 50h are probably fine to be overwritten.


Some more notes on your TSR:

In your executable entrypoint you have a near jump to skip the buffer later used for the table data:

entry:    jmp       trans
          db        1021 dup 0x00

It is implied by the size of the buffer that the jump must be near, so it takes 3 bytes, and then you add another 1021 bytes to get a total of 1024 bytes. (I think the "dup" syntax is only supported by recent versions of NASM, but that's not important.) However, I'd prefer to use some calculation to get NASM to reliably fill the buffer, such as:

entry:  jmp trans
        times 1024 - ($ - entry) db 0


Alternatively, you could use my fill macro [10] like this:

entry:  fill 1024, 0, jmp trans


(Also, as I think you already suggested in this thread, for optimising the transient executable size you could put one of the tables into this buffer to save 1 KiB at the end of the executable. (Just the jump needs to stay, you could re-initialise it to hold the correct 3byte for the table start later.) Or stash some of the messages in there, as long as they're shorter than the 1 KiB size.)

You do not use an IBM Interrupt Sharing Protocol (IISP) [11] header for your interrupt hook. Therefore, you could optimise this part a bit, from:

old2F:    dd        0xFFFF0000
...
.1:       jmp far   [cs:old2F]


Into this:

.1:     jmp 0:0
old2F equ $ - 4


This is some self-modifying code (SMC) to stash the downlink into an immediate far jump instruction, instead of using the indirect far jump to refer to a different memory location in your code segment. The dollar sign is used in the equate to denote the current assembly position after the 5-byte instruction; it is offset by minus four so as to address the far pointer in the instruction's encoding. (As mentioned, you cannot do this if you use a standard IISP header, because that has a "jmp short $+18" (EBh 10h) instruction right in front of the downlink field.)


Next, you're using "or al, al" to check a register for zero. However, it is more idiomatic [12] to use "test al, al" instead, which right away hints (to a reader or even to the processor) that no change of the register occurs.


Additionally, you're using two instructions of the test or compare types to dispatch down three different paths. (These are handling the function 00h call, or the function 01h call, or anything else.) One comparison instruction suffices to do that however. Observe:

new2F:
...
        cmp al, 1
        ja .chain
        je .function_01
.function_00:


Other than your earlier use of "dup" syntax, you are also using "mov word ds:[bx], entry". The segment override outside the brackets is another MASMism that NASM has recently learned to support. I prefer the segment override within the brackets however.


PRTSCR also includes the ability for the TSR to allocate memory blocks in 
Expanded Memory (EMS) or Extended Memory (EMS, though this happens indirectly 
through the use of DOS Protected Mode Services or DPMS).

I think you meant XMS as the abbreviation of "Extended Memory" here, Bret.

Using these techniques, you can actually have a complicated TSR that requires 
LOTS of data but only a small part of the data (and code) requires the use of 
conventional (or even upper) memory.

I'm still experimenting with the EMS & DPMS things so don't think that part is 
necessarily "good to go", but it is something you can experiment with if you want.  
I'm also converting the code from A86 to NASM, and the code on the web site is in A86 
(actually, A386) format so you would need modify it to work with some other assembler.

One of my applications, the lDebug debugger (with a small "L"), will use XMS for two features: The video screen swapping recently copied from FreeDOS Debug, and the symbolic debugging support that is not yet included in the builds of lDebug I prepare on our server.

The symbol tables for the symbolic option may require lots of memory. I capped this at 256 KiB for 86 Mode memory (ie, the first 1024 or 1088 KiB, as addressable directly in Real or Virtual 86 Mode), but the XMS possibility supports symbol tables up to the maximum of 2 MiB (plus transfer buffer), which maxes out the 16-bit indices used to refer to the symbol main, hash, and string data. I'm approaching the 64 KiB segment limit both for my code segment and my entry/data/stack segment, so it is a very good thing to not cram the symbol tables into that. Even using additional segments, there is no way to fit 2 MiB into the 86 Mode memory.

The way I support XMS is by defining a small (260 bytes) buffer in the 86M memory of my data segment, called the access slice. Any access to the symbol tables (either main, hash, or string entries) goes through some functions that take an index to read from the symbol tables and return a far pointer. (The reason for not hardcoding the access slice address, and using the pointer instead, is to allow addressing 86 Mode memory symbol tables directly while XMS symbol tables use the access slice method, without duplicating all the "business logic" for each method.) The appropriate data is copied from the XMS allocation's space into the access slice. If the application wants to modify some part of the symbol tables, it first requests the access slice be filled, then modifies the data in the access slice, and then calls another function to copy back the changes from the access slice to the symbol tables.

Regards,
ecm


[1]: https://hg.pushbx.org/ecm/rxansi
[2]: https://hg.pushbx.org/ecm/tsr
[3]: https://hg.pushbx.org/ecm/lclock
[4]: https://hg.pushbx.org/ecm/tsr/file/daca203fa216/transien.asm#l962
[5]: https://sand.truman.edu/~dbindner/freeware/
[6]: https://hg.pushbx.org/ecm/fdapm/file/62a7d769a9f6/source/fdapm/fdapm.asm#l154 [7]: https://hg.pushbx.org/ecm/fdapm/file/62a7d769a9f6/source/fdapm/fdapm.asm#l146 [8]: https://retrocomputing.stackexchange.com/questions/20001/how-much-of-the-program-segment-prefix-area-can-be-reused-by-programs-with-impun/20006#20006
[9]: https://fd.lod.bz/rbil/interrup/dos_kernel/2126.html
[10]: https://hg.pushbx.org/ecm/lmacros/file/9fa0e64034cd/lmacros1.mac#l916
[11]: https://fd.lod.bz/rbil/interrup/tsr/2d.html
[12]: https://stackoverflow.com/questions/33721204/test-whether-a-register-is-zero-with-cmp-reg-0-vs-or-reg-reg


_______________________________________________
Freedos-devel mailing list
Freedos-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freedos-devel

Reply via email to