At Mon, 4 Mar 2013 19:06:32 +0300, Michael Filonenko wrote: > The following pull request provides long double type (extflonum) on > win32: https://github.com/plt/racket/pull/265
Merged --- with some changes, as usual... > It seems that RacketCGC is supposed to be built without any > third-party DLLs (longdouble.dll being one of them), so the > following building process seems natural: [...] I changed the way that "longdouble.dll" is loaded and linked so that `extflonum-available?' returns #f if "longdouble.dll" isn't found. Since extflonums are not needed to build Racket 3m, that solves the build-order problem. > 1. There is currently a problem with the code generation in > foreign.rktc. [...] > Probably there is a need for an option in foreign.rktc declaratons > to turn off casting in this partuicular case. Matthew, it is possible > to add such an option? Yes, done. > 2. xform.rkt contains a long list of all long double arithmetic > [non-]functions. Instead of changing "xform.rkt", I added XFORM_NONGCING annotations to the function prototypes. > 3. In numstr.c I could not avoid separating the parsing of > double and long double values. That looks ok. Another problem is that Windows seems unhappy with changing the precision mode. The _control87() function apparently ignores an attempt to change the mode, and when I set the mode using a FLDCW instruction, some library function resets it back. After a brief and unsuccessful attempt to track down where the mode is reset, I changed the DLL and JIT to set the mode just before performing extflonum arithmetic, and then set it back afterward. Of course, there can be a cost to changing the mode at such a fine granularity. When I run the first program below in Mac OS X 64-bit mode, I get 'flonum cpu time: 483 real time: 483 gc time: 0 1.0 cpu time: 474 real time: 474 gc time: 0 1.0 cpu time: 789 real time: 787 gc time: 0 1.0 'extflonum cpu time: 641 real time: 640 gc time: 0 1.0t0 cpu time: 885 real time: 884 gc time: 0 1.0t0 cpu time: 959 real time: 958 gc time: 0 1.0t0 but if I force the JIT to set the control word on every extflonum operation, I get .... 'extflonum cpu time: 806 real time: 806 gc time: 0 1.0t0 cpu time: 1054 real time: 1053 gc time: 0 1.0t0 cpu time: 959 real time: 957 gc time: 0 1.0t0 It looks like division is slow enough to mask the overhead of setting the mode. For addition and subtraction, the overhead seems to be on the order of the cost of switching flonums to extflonums. The JIT could be improved to avoid switching between consecutive operations, but does the cost of this approach look reasonable as a start? ---------------------------------------- #lang racket (require racket/flonum racket/extflonum) 'flonum (time (for/fold ([v 1.0]) ([i (in-range 100000000)]) (fl- (fl+ v v) v))) (time (for/fold ([v 1.0]) ([i (in-range 100000000)]) (fl- (fl+ v v) 1.0))) (time (for/fold ([v 1.0]) ([i (in-range 100000000)]) (fl/ (fl* v v) v))) 'extflonum (time (for/fold ([v 1.0t0]) ([i (in-range 100000000)]) (extfl- (extfl+ v v) v))) (time (for/fold ([v 1.0t0]) ([i (in-range 100000000)]) (extfl- (extfl+ v v) 1.0t0))) (time (for/fold ([v 1.0t0]) ([i (in-range 100000000)]) (extfl/ (extfl* v v) v))) _________________________ Racket Developers list: http://lists.racket-lang.org/dev