I want to open this to a wider audience, especially experts on the x86 in all of its flavors.
The question is, how do you reliably test the current state of the A20 gate? Put more generally, how can you detect when two different addresses actually refer to the same memory location? The original HIMEM approach, which FDXMS also uses, is simply to compare the 256 bytes at 0:0 with those at FFFFh:10h (using cmpsd), and assume A20 is disabled if the bytes compare equal. Let's call this the "original approach". This is simple, but it has the possibility of a "false negative" if the gate is enabled but the bytes just happen to be the same. In the newer version of HIMEM (HIMEM64C), Michael grabbed some public domain code which does this: xor ax,ax mov ds,ax dec ax mov es,ax ; es->FFFF seg mov ax,[es:10h] ; read word at FFFF:10h, the 1M limit not ax ; ~1M word push [WORD ds:0] ; save word we're changing (INT 0 offset) mov [ds:0],ax ; save ~1M word to 0:0 (and FFFF:10h if !A20) mov ax,[ds:0] ; read back, may be unnecessary (forced memory access?) cmp ax,[es:10h] ; compare 0:0 ~1M word to 1M word, only equal if A20 is disabled pop [WORD ds:0] ; restore INT 0 offset In case the comments are not clear... The strategy is to read from location [FFFFh:10h], negate it, write the result to [0:0], read it back (for no immediately obvious reason), then compare it to location [FFFFh:10h] to see if they are the same. Let's call this the "new approach". I can think of at least four potential issues with any code which tries to detect the state of A20. 1) Some systems are known to take "a long time" for the state of A20 to actually change. What happens if it changes while the detection code is executing? 2) What happens if an interrupt occurs and memory is modified? (Or if the routine is re-entered?) 3) What effect might the cache have? 4) What effect might instruction reordering have? I am pretty sure issue 1 is not a problem, as long as the "new approach" only writes to [0:0] and not [FFFFh:10h]. (Although I have seen code on USENET which does the latter!) Issue 2 (interrupts) is a potential problem for the original approach, because cmpsd is interruptible. So if A20 is disabled and the bytes in low memory change during an interrupt, the comparison might fail and give the wrong answer. With the new code, there is a potential problem if an interrupt negates FFFFh:10h when the A20 gate is open. One could argue that none of this is very likely. Issue 3 (cache) is supposedly not a problem on the 486 and higher, because the processor has an "A20 input" pin which gets routed all the way to the top of the memory hierarchy (before the L1 cache). Or so I have read. Caching may yet pose a problem on 386 machines, which is too bad because only 486 and higher have the wbinvd instruction... Finally, issue 4 (instruction reordering) explains the mysterious "read back" in the new code. Repeating the relevant lines: mov [ds:0],ax ; save ~1M word to 0:0 (and FFFF:10h if !A20) mov ax,[ds:0] ; read back, may be unnecessary (forced memory access?) cmp ax,[es:10h] ; compare 0:0 ~1M word to 1M word, only equal if A20 is disabled Without the "read back", the processor could reorder the instructions such that the write to [0:0] happens after the read from [FFFFh:10h]. (It can do this reordering, in theory, because the addresses are different.) Somebody must have figured that having a read which depends on a write will prevent the write from being reordered past the read. In other words, since the second instruction depends on the first and the third depends on the second, this code should execute in order. Now, I am not 100% convinced that this will work. Is there any reason to think a sufficiently "smart" processor will not make mincemeat of this code sequence anyway? I am especially worried about writing to memory and reading it back into the same register; could a smart CPU optimize away the read entirely, and then reorder the code? Perhaps it would be better to rewrite the sequence like this: mov [ds:0],ax ; write to location 0:0 mov ax,[es:10h] ; read from location FFFFh:10h cmp ax,[ds:0h] ; compare with [0:0] At least this way avoids the weird-looking "write to 0, read it straight back" sequence. But really, the only way to be sure might be to insert barrier instructions... I know CPUID would work, but that only exists on 486 and higher. Is there any certain way to fix this which will work on 386+? - Pat ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Freedos-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/freedos-devel