Re: [Freedos-devel] Another implementation of GRAFTABL

C. Masloch Tue, 12 Jul 2022 11:57:29 -0700

Hi list,

On at 2022-07-12 16:29 +0000, Bret Johnson wrote:

For TSR's, there are additional things you can do to reduce memory.  You can 
look at the source code for my PRTSCR program (available at 
http://bretjohnson.us) that uses a BUNCH of tricks.  For example, it doesn't 
even use the DOS TSR interrupt.


The way it works is to make a "copy" of itself at the top of conventional memory, terminates itself (using a normal DOS 
terminate process, which includes deleting the original PSP), and then continues running from the "copy".  The 
"copy" decides where the best place in memory is to load the TSR (which can even be in one or more "memory 
holes" left by some other program or in upper memory), allocates appropriate memory block(s), and then installs itself in 
the allocated memory.  I learned that technique from ECM a long time ago.  It's much more complicated than a "normal" 
TSR installation, but is much more efficient in terms of ultimate memory use.

I developed this optimal installation handling first for RxANSI, basedon Henrik Haftmann's ANSI, which I forked starting in 2008 [1]. Ieventually adapted it into the TSR example [2] and then also for someother TSRs such as lClock [3].

To note is that the resident block installed this way doesn't have a PSPat all, just an MCB with itself as the owner and a DOS v4+ style MCBname for MEM type programs.



Other notes on your TSR and application:

Your GRAFTABL uses interrupt 21h service 31h to stay resident. You dofree the environment, but to be on the safe side it is better to alsoclear the process's environment field to zero. Example code is in my TSRexample [4]:


        xor ax, ax
        xchg ax, word [ cs:2Ch ]; set PSP field to zero
        mov es, ax
        mov ah, 49h
        int 21h                 ; Free our environment

Further, you calculate the size of the PSP block to keep resident byusing test, add, and shifts:


          mov       dx, trans           ; Allow everything preceding the

test dx, 0x000F ; transient portion ofGRAFTABL to

          jz        .20                 ;   remain resident. (Rounded up to

add dx, 0x10 ; the next paragraph, becauseMS-DOS

.20:      mov       cl, 4               ;   wants the size in paragraphs.)
          shr       dx, cl
          mov       ax, 0x3100
          int       0x21                ; TSR EXIT CODE 0

It is more efficient to change this like so:

        mov dx, trans
        add dx, 15
        mov cl, 4
        shr dx, cl

The "add dx, 15" makes the shr round up in its division. (I expect thatthe addition won't overflow here.) However, you can easily change it sothat the addition is done by the assembler at build time, using "mov dx,trans + 15".

Moreover, while it is not supported in the most obvious way (like "movdx, (trans + 15) / 16") you can teach NASM to do the entire calculation(shift or division and all) by calculating a scalar length of theprogram (as opposed to a relocatable symbol like created by your"trans:" label). There's an example of this in the DVORAK TSR (under GNUGPL v2+) by Donald Bindner [5] that goes as follows:



even 16
end_of_resident equ ($-$$ + 0100h)
...
    mov dx, end_of_resident/16      ; number of paragraphs to keep
    mov ax, 3100h                   ; terminate resident w/ 0 return code
    int 21h

As you can see, NASM allows to divide the scalar value. (Rounding up isnot needed in this particular calculation because they already alignedthe position of the end_of_resident equate to a 16-byte boundary.)

Here is another example [6], in FDAPM (by Eric Auer) which I extendedwith some but not all of my TSR ideas:


        mov dx, (eofTSR - start + 256 + 15) >> 4
                        ; +256 for PSP, start is at offset 100h
        mov ax,3101h    ; go TSR, errorlevel 1
        int 21h

In my TSRs and other applications I use more sophisticated calculations,some much more so. These are often based on different sections orsegments of the program. I generally calculate deltas to addressdifferent parts and hardcode resulting values into the program at buildtime. However, the suboptimal way of calculating the amount ofparagraphs at run time is a very common oversight.

Another problem (which is also little known) is that your use ofinterrupt 21h service 31h will retain all of your currently open processhandles, as well as the entire system's System File Table (SFT) entriesassociated with these. This is not a problem by default because all yourhandles will be DUPlicated from the parent's, for your stdin, stdout,stderr, stdaux, and stdprn. That means they will share the same SFTentries as already used by the shell. However, if the user runs yourprogram with output redirection (either to a file or a character devicesuch as NUL), as in "graftabl > nul", then you will leak the SFT entrywhich was reserved for your process to use.

I assume that the people involved in the design of this DOS serviceexpected that TSRs would generally want to keep around their PSPs, sothat they could swap processes and then use their own handles aspreserved in their process handle tables. However, in practice most TSRsnever re-use their PSPs after the DOS TSR termination handling is done.So in that case, as it is for your application, you should explicitlyfree all handles before terminating.

Here's how I solved it [7] in FDAPM (and with equivalent code in FreeDOSSHARE):


        xor bx, bx              ; = 0
        mov cx, word [32h]      ; get amount of handles
.loop:
        mov ah, 3Eh
        int 21h                 ; close it
        inc bx                  ; next handle
        loop .loop              ; loop for all process handles -->

Further, you can certainly re-use the "zero page" (PSP) space startingat offset 80h, which holds the command line tail by default but is alsothe default DTA for your process. That latter fact is a big hint thatDOS doesn't need this buffer to be preserved. Even more space at thetail of the PSP can be re-used. DOS enforces a minimum resident size forthe service 31h PSP allocation of 60h bytes [8], and the only known usesof the space after that is for the default unopened FCBs [9]. Even theadditional 16 bytes down to 50h are probably fine to be overwritten.



Some more notes on your TSR:

In your executable entrypoint you have a near jump to skip the bufferlater used for the table data:


entry:    jmp       trans
          db        1021 dup 0x00

It is implied by the size of the buffer that the jump must be near, soit takes 3 bytes, and then you add another 1021 bytes to get a total of1024 bytes. (I think the "dup" syntax is only supported by recentversions of NASM, but that's not important.) However, I'd prefer to usesome calculation to get NASM to reliably fill the buffer, such as:


entry:  jmp trans
        times 1024 - ($ - entry) db 0


Alternatively, you could use my fill macro [10] like this:

entry:  fill 1024, 0, jmp trans

(Also, as I think you already suggested in this thread, for optimisingthe transient executable size you could put one of the tables into thisbuffer to save 1 KiB at the end of the executable. (Just the jump needsto stay, you could re-initialise it to hold the correct 3byte for thetable start later.) Or stash some of the messages in there, as long asthey're shorter than the 1 KiB size.)

You do not use an IBM Interrupt Sharing Protocol (IISP) [11] header foryour interrupt hook. Therefore, you could optimise this part a bit, from:


old2F:    dd        0xFFFF0000
...
.1:       jmp far   [cs:old2F]


Into this:

.1:     jmp 0:0
old2F equ $ - 4

This is some self-modifying code (SMC) to stash the downlink into animmediate far jump instruction, instead of using the indirect far jumpto refer to a different memory location in your code segment. The dollarsign is used in the equate to denote the current assembly position afterthe 5-byte instruction; it is offset by minus four so as to address thefar pointer in the instruction's encoding. (As mentioned, you cannot dothis if you use a standard IISP header, because that has a "jmp short$+18" (EBh 10h) instruction right in front of the downlink field.)

Next, you're using "or al, al" to check a register for zero. However, itis more idiomatic [12] to use "test al, al" instead, which right awayhints (to a reader or even to the processor) that no change of theregister occurs.

Additionally, you're using two instructions of the test or compare typesto dispatch down three different paths. (These are handling the function00h call, or the function 01h call, or anything else.) One comparisoninstruction suffices to do that however. Observe:


new2F:
...
        cmp al, 1
        ja .chain
        je .function_01
.function_00:

Other than your earlier use of "dup" syntax, you are also using "movword ds:[bx], entry". The segment override outside the brackets isanother MASMism that NASM has recently learned to support. I prefer thesegment override within the brackets however.

PRTSCR also includes the ability for the TSR to allocate memory blocks in 
Expanded Memory (EMS) or Extended Memory (EMS, though this happens indirectly 
through the use of DOS Protected Mode Services or DPMS).


I think you meant XMS as the abbreviation of "Extended Memory" here, Bret.

Using these techniques, you can actually have a complicated TSR that requires 
LOTS of data but only a small part of the data (and code) requires the use of 
conventional (or even upper) memory.

I'm still experimenting with the EMS & DPMS things so don't think that part is 
necessarily "good to go", but it is something you can experiment with if you want.  
I'm also converting the code from A86 to NASM, and the code on the web site is in A86 
(actually, A386) format so you would need modify it to work with some other assembler.

One of my applications, the lDebug debugger (with a small "L"), will useXMS for two features: The video screen swapping recently copied fromFreeDOS Debug, and the symbolic debugging support that is not yetincluded in the builds of lDebug I prepare on our server.

The symbol tables for the symbolic option may require lots of memory. Icapped this at 256 KiB for 86 Mode memory (ie, the first 1024 or 1088KiB, as addressable directly in Real or Virtual 86 Mode), but the XMSpossibility supports symbol tables up to the maximum of 2 MiB (plustransfer buffer), which maxes out the 16-bit indices used to refer tothe symbol main, hash, and string data. I'm approaching the 64 KiBsegment limit both for my code segment and my entry/data/stack segment,so it is a very good thing to not cram the symbol tables into that. Evenusing additional segments, there is no way to fit 2 MiB into the 86 Modememory.

The way I support XMS is by defining a small (260 bytes) buffer in the86M memory of my data segment, called the access slice. Any access tothe symbol tables (either main, hash, or string entries) goes throughsome functions that take an index to read from the symbol tables andreturn a far pointer. (The reason for not hardcoding the access sliceaddress, and using the pointer instead, is to allow addressing 86 Modememory symbol tables directly while XMS symbol tables use the accessslice method, without duplicating all the "business logic" for eachmethod.) The appropriate data is copied from the XMS allocation's spaceinto the access slice. If the application wants to modify some part ofthe symbol tables, it first requests the access slice be filled, thenmodifies the data in the access slice, and then calls another functionto copy back the changes from the access slice to the symbol tables.


Regards,
ecm


[1]: https://hg.pushbx.org/ecm/rxansi
[2]: https://hg.pushbx.org/ecm/tsr
[3]: https://hg.pushbx.org/ecm/lclock
[4]: https://hg.pushbx.org/ecm/tsr/file/daca203fa216/transien.asm#l962
[5]: https://sand.truman.edu/~dbindner/freeware/

[6]:https://hg.pushbx.org/ecm/fdapm/file/62a7d769a9f6/source/fdapm/fdapm.asm#l154[7]:https://hg.pushbx.org/ecm/fdapm/file/62a7d769a9f6/source/fdapm/fdapm.asm#l146[8]:https://retrocomputing.stackexchange.com/questions/20001/how-much-of-the-program-segment-prefix-area-can-be-reused-by-programs-with-impun/20006#20006

[9]: https://fd.lod.bz/rbil/interrup/dos_kernel/2126.html
[10]: https://hg.pushbx.org/ecm/lmacros/file/9fa0e64034cd/lmacros1.mac#l916
[11]: https://fd.lod.bz/rbil/interrup/tsr/2d.html

[12]:https://stackoverflow.com/questions/33721204/test-whether-a-register-is-zero-with-cmp-reg-0-vs-or-reg-reg



_______________________________________________
Freedos-devel mailing list
Freedos-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freedos-devel

Re: [Freedos-devel] Another implementation of GRAFTABL

Reply via email to