On Tue, 8 May 2012, John Harrison wrote:

> | This must have been a really long time ago.  I've got exposed to SML 
> in | 1993 and it had already the integer arithmetic defined properly, as 
> far as | I remember. In any case it is all hard and fast in SML'97. 
> The Standard | document is not perfect, but its flaws are tiny compared 
> to anything else | in the greater ML family, or the world out there.
>
> I was thinking more specifically of the formal semantics in "The 
> Definition of Standard ML", which said almost nothing about integer 
> arithmetic. But for all I know that document may also have evolved.

It could well be that the SML90 standard did not specify int precisely. 
I remember from that time that Poly/ML already had "proper int", i.e. 
practically unbounded ones, and SML/NJ 0.93 had 31 bits.  But I am 
relatively sure that bounded ints were already required to raise Overflow 
exception explicitly, though.

In SML97 things were defined more precisely, providing a menagerie of 
fixed-precision ints (Int31, Int32, Int64 etc. all with proper Overflow), 
with bottom and top elements Int and IntInf of unspecified precision. 
David Matthews had to invest extra efforts to support all this in later 
versions of Poly/ML when he moved fro SML90 to SML97, but he retained Int 
= IntInf as the quasi-unbounded one (in accordance to the standard).

SML97 also introduced a lot of Word8/16/32/64 types, in the manner of 
Java, but they call it "int" or "long" there.  I try to ignore this 
intrusion of word arithmetic into symbolic computation as much as 
possible.


> I still don't much like the fact that SML leaves integer arithmetic 
> underspecified. For those of us in symbolic computing, it would be 
> really nice if we could just treat integers as integers and the only 
> failure could be running out of memory.

Excactly.

We traditionally support both SML/NJ and Poly/ML in Isabelle, and some 
users occasionally hit the wall with the differint int types, until I made 
an end of it in 2007/2008 with some tricks on the static phase on SML/NJ 
to enforce its Int = IntInf.  This made the thing 30% slower in total, 
which is not so much, considering the really slow IntInf implementation of 
NJ (as they admit that in their sources).

So we still conform to the standard, and make things convenient for 
symbolic computation at the same time.


> | Poly/ML manages to implement proper big ints efficiently, with | 
> silent upgrade of machine word arithmetic to library functions, | when 
> overflow occurs in the hardware.  (This is actually a | vulnerability in 
> the sense that the GMP library used here is very | complex and not 
> verified, as far as I know.)
>
> Does the Isabelle core actually depend on the correctness of integer 
> arithmetic?

I've asked myself the question occasionally over the years.  There has 
also been an incident once, where a repository version of Poly/ML was 
producing wrong results with big ints, when run on multicore hardware with 
its complex memory model.

David Matthews now provides a tracing facility to count the uses of 
integer arithmetic outside the machine word size.  It shows that on 64 bit 
hardware, one can easily have a kernel implementation with non-big ints 
(still with the proper check of overflow).

Oddly, the main everyday application of quite large integers for 
administrative purposes was to provide time-stamps in nanoseconds in the 
run-time system.

There are also some dedicated Isabelle/HOL applications where really big 
ints occur: the code generator is used to turn HOL specifications into 
corresponding ML that is then executed to normalize certain expressions. 
It is the well-known not-quite-LCF-style "reflection" with the Isabelle 
code generator used in some special proof methods outside the main 
library.


> There are two uses of ints in the HOL Light core that I can 
> think of:
>
> * The term ordering uses {-1,0,1} for {<,=,>}. It might be nice to
>   replace this with some custom ternary type, but at least this
>   doesn't get into the range where machine arithmetic is an issue
>   (except on a 1-bit machine, perhaps).

That's this aftertaste of C in OCaml again :-) :-( The SML Basis Library 
is not very pretty, but it uses datatype ord = LESS | EQUAL | GREATER in 
the obvious manner.  But I don't see any problem here, apart from weak 
typing of operations.


> * Each type constructor has an int for its arity and when the user
>   tries to construct a type with it, this value is compared with the
>   length (also as an int) of the list of type arguments. So in
>   principle you could declare a type constructor with arity n and
>   later give it n + 2^31 arguments (on a 32-bit platform). This is
>   a bit of a rough edge, but of somewhat academic interest, and I
>   don't think it implies a soundness problem (you could imagine
>   allowing type constructors with the same name but different
>   arities).

A related incident in Isabelle many years ago: As you know we have 
de-Bruijn indices for bound variables (free ones are named as in 
HOL-Light).  Larry used to have some code that would convert the index 
(unbounded integer) back and forth wrt. a singleton string (essentially an 
8bit integer simulated in SML90 without char type).  When David von Oheimb 
did his Bali formalizations in the late 1990-ies, he hit that wall with 
terms that had more than 255 nested abstractions.  Luckily it was an 
exception, not a kernel flaw.

Back then, I repaired Larry's code and resisted the temptation to upgrade 
the 8bit integer into 16bit.  Instead I made an unbounded representation 
within the string type in the ovious way.


Anyway, for LCF-style kernel design type int and string are often 
interchangeable to identify formal entities.  We have now both 
side-by-side for the main formal entities: types, consts, axioms etc. all 
have a fully-qualified name and a unique integer id in the internal 
tables.  For both I expect immutability and unboundedness, at least in the 
usual approximative sense of concrete machines running out of memory 
eventually.

The main critical operation of the sub-kernel infrastructure is now a 
counter to produce fresh integers.  One could probably do this with 64bit 
word arithmetic and explicit bounds check equally well; it means the 
machine will stop working at some point in the future, and needs to be 
restarted.


> Still, maybe Mark Adams will find a flaw in that reasoning and then I'll 
> change to using unit list instead of int!

For such a high-assurange project, I would recommend judicious use of 
immutable strings or 64 bit bounded machine integers (with explicit 
overflow check), depending on the situation.


        Makarius


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
hol-info mailing list
hol-info@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/hol-info

Reply via email to