I want to open this to a wider audience, especially experts on the x86
in all of its flavors.

The question is, how do you reliably test the current state of the A20
gate?  Put more generally, how can you detect when two different
addresses actually refer to the same memory location?

The original HIMEM approach, which FDXMS also uses, is simply to
compare the 256 bytes at 0:0 with those at FFFFh:10h (using cmpsd),
and assume A20 is disabled if the bytes compare equal.  Let's call
this the "original approach".

This is simple, but it has the possibility of a "false negative" if
the gate is enabled but the bytes just happen to be the same.

In the newer version of HIMEM (HIMEM64C), Michael grabbed some public
domain code which does this:

    xor     ax,ax
    mov     ds,ax
    dec     ax
    mov     es,ax           ; es->FFFF seg
    mov     ax,[es:10h]     ; read word at FFFF:10h, the 1M limit
    not     ax              ; ~1M word
    push    [WORD ds:0]     ; save word we're changing (INT 0 offset)
    mov     [ds:0],ax       ; save ~1M word to 0:0 (and FFFF:10h if !A20)
    mov     ax,[ds:0]       ; read back, may be unnecessary (forced memory access?)
    cmp     ax,[es:10h]     ; compare 0:0 ~1M word to 1M word, only equal if A20 is 
disabled
    pop     [WORD ds:0]     ; restore INT 0 offset

In case the comments are not clear...  The strategy is to read from
location [FFFFh:10h], negate it, write the result to [0:0], read it
back (for no immediately obvious reason), then compare it to location
[FFFFh:10h] to see if they are the same.  Let's call this the "new
approach".

I can think of at least four potential issues with any code which
tries to detect the state of A20.

  1) Some systems are known to take "a long time" for the state of A20
     to actually change.  What happens if it changes while the
     detection code is executing?

  2) What happens if an interrupt occurs and memory is modified?  (Or
     if the routine is re-entered?)

  3) What effect might the cache have?

  4) What effect might instruction reordering have?

I am pretty sure issue 1 is not a problem, as long as the "new
approach" only writes to [0:0] and not [FFFFh:10h].  (Although I have
seen code on USENET which does the latter!)

Issue 2 (interrupts) is a potential problem for the original approach,
because cmpsd is interruptible.  So if A20 is disabled and the bytes
in low memory change during an interrupt, the comparison might fail
and give the wrong answer.  With the new code, there is a potential
problem if an interrupt negates FFFFh:10h when the A20 gate is open.
One could argue that none of this is very likely.

Issue 3 (cache) is supposedly not a problem on the 486 and higher,
because the processor has an "A20 input" pin which gets routed all the
way to the top of the memory hierarchy (before the L1 cache).  Or so I
have read.  Caching may yet pose a problem on 386 machines, which is
too bad because only 486 and higher have the wbinvd instruction...

Finally, issue 4 (instruction reordering) explains the mysterious
"read back" in the new code.  Repeating the relevant lines:

    mov     [ds:0],ax       ; save ~1M word to 0:0 (and FFFF:10h if !A20)
    mov     ax,[ds:0]       ; read back, may be unnecessary (forced memory access?)
    cmp     ax,[es:10h]     ; compare 0:0 ~1M word to 1M word, only equal if A20 is 
disabled

Without the "read back", the processor could reorder the instructions
such that the write to [0:0] happens after the read from [FFFFh:10h].
(It can do this reordering, in theory, because the addresses are
different.)  Somebody must have figured that having a read which
depends on a write will prevent the write from being reordered past
the read.  In other words, since the second instruction depends on the
first and the third depends on the second, this code should execute in
order.

Now, I am not 100% convinced that this will work.  Is there any reason
to think a sufficiently "smart" processor will not make mincemeat of
this code sequence anyway?  I am especially worried about writing to
memory and reading it back into the same register; could a smart CPU
optimize away the read entirely, and then reorder the code?

Perhaps it would be better to rewrite the sequence like this:

    mov     [ds:0],ax   ; write to location 0:0
    mov     ax,[es:10h] ; read from location FFFFh:10h
    cmp     ax,[ds:0h]  ; compare with [0:0]

At least this way avoids the weird-looking "write to 0, read it
straight back" sequence.

But really, the only way to be sure might be to insert barrier
instructions...  I know CPUID would work, but that only exists on 486
and higher.

Is there any certain way to fix this which will work on 386+?

 - Pat


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Freedos-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/freedos-devel

Reply via email to