[racket-dev] Extflonum type for windows

2013-03-04 Thread Michael Filonenko
Hello all,

The following pull request provides long double type (extflonum) on
win32: https://github.com/plt/racket/pull/265

Extflonum arithmetic is implemented in a set of functions
compiled into longdouble.dll (dll and lib attached, source
code included in the pull request). It has been compiled
with mingw-w64. In ordinary mingw there is no long double
input (neither non-msvc strtold nor scanf).

All those functions accept a special long_double union:

#define SIZEOF_LONGDOUBLE 16
typedef union long_double
{
  char bytes[SIZEOF_LONGDOUBLE];
#ifdef __MINGW__
  long double val;
#endif
} long_double;

Racket's libffi uses this union for ffi facilities instead
of ffi_type_longdouble, because ffi_type_longdouble is
defined to ffi_type_double on win platforms.

To compile the longdouble library yourself, use the following
commands:
(mingw-w64, msvc environments are required):

cd src/racket/src/longdouble
gcc.exe -shared -o longdouble.dll longdouble.c
-Wl,--output-def,longdouble.def,--out-implib,longdouble.a -I.
lib /machine:i386 /def:longdouble.def
copy /Y longdouble.dll ..\..\..\..\lib\.

It seems that RacketCGC is supposed to be built without any
third-party DLLs (longdouble.dll being one of them), so the
following building process seems natural:

1. RacketCGC builds with Visual Studio, without MZ_LONG_DOUBLE
and without longdouble.dll. (There is currently no MS_LONG_DOUBLE
macro
in worksp/mzconfig.h)

2. RacketCGC is called to download third-party libraries, including
longdouble.dll and longdouble.lib

3. gc2/make.rkt is called to compile Racket3m. It uses command
line options to define MS_LONG_DOUBLE macro for the compiled
code and to link it against longdouble.lib

Matthew, could you please adjust the build scripts so that
items 2 and 3 work? Or, of course, any other way that you find
convenient.

There are some issues which I do not know how to solve, because they
depend on racket maintainers work style.

Some miscellaneous notes:

1. There is currently a problem with the code generation in
foreign.rktc.
As a result of foreign.rktc's code generation, line 1974 is supposed
to be:
   tmp = (mz_long_double)(SCHEME_MAYBE_LONG_DBL_VAL(val));
Unfortunately, that does not compile, saying
   error C2440: 'type cast' : cannot convert from 'long_double' to
'mz_long_double'
So we must not cast the structure value type into itself.
Good news is that simple assignment without casting works fine.
Probably there is a need for an option in foreign.rktc declaratons
to turn off casting in this partuicular case. Matthew, it is possible
to add such an option?

2. xform.rkt contains a long list of all long double arithmetic
[non-]functions.

3. In numstr.c I could not avoid separating the parsing of
double and long double values.


I have tested the code on windows 7 x86 msvc 2008 (virtualbox).
x64 has not been tested yet.

Binary longdouble library can be found here:
https://github.com/filonenko-mikhail/racket/tree/extflonum-windows/src/racket/src/longdouble

--
With best regards, Michael Filonenko
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] Extflonum type for windows

2013-03-04 Thread Matthew Flatt
Great! This all looks like the right idea, and I'll take a closer look
as soon as possible, but maybe not until the end of the week.

At Mon, 4 Mar 2013 19:06:32 +0300, Michael Filonenko wrote:
> Hello all,
> 
> The following pull request provides long double type (extflonum) on
> win32: https://github.com/plt/racket/pull/265
> 
> Extflonum arithmetic is implemented in a set of functions
> compiled into longdouble.dll (dll and lib attached, source
> code included in the pull request). It has been compiled
> with mingw-w64. In ordinary mingw there is no long double
> input (neither non-msvc strtold nor scanf).
> 
> All those functions accept a special long_double union:
> 
> #define SIZEOF_LONGDOUBLE 16
> typedef union long_double
> {
>   char bytes[SIZEOF_LONGDOUBLE];
> #ifdef __MINGW__
>   long double val;
> #endif
> } long_double;
> 
> Racket's libffi uses this union for ffi facilities instead
> of ffi_type_longdouble, because ffi_type_longdouble is
> defined to ffi_type_double on win platforms.
> 
> To compile the longdouble library yourself, use the following
> commands:
> (mingw-w64, msvc environments are required):
> 
> cd src/racket/src/longdouble
> gcc.exe -shared -o longdouble.dll longdouble.c
> -Wl,--output-def,longdouble.def,--out-implib,longdouble.a -I.
> lib /machine:i386 /def:longdouble.def
> copy /Y longdouble.dll ..\..\..\..\lib\.
> 
> It seems that RacketCGC is supposed to be built without any
> third-party DLLs (longdouble.dll being one of them), so the
> following building process seems natural:
> 
> 1. RacketCGC builds with Visual Studio, without MZ_LONG_DOUBLE
> and without longdouble.dll. (There is currently no MS_LONG_DOUBLE
> macro
> in worksp/mzconfig.h)
> 
> 2. RacketCGC is called to download third-party libraries, including
> longdouble.dll and longdouble.lib
> 
> 3. gc2/make.rkt is called to compile Racket3m. It uses command
> line options to define MS_LONG_DOUBLE macro for the compiled
> code and to link it against longdouble.lib
> 
> Matthew, could you please adjust the build scripts so that
> items 2 and 3 work? Or, of course, any other way that you find
> convenient.
> 
> There are some issues which I do not know how to solve, because they
> depend on racket maintainers work style.
> 
> Some miscellaneous notes:
> 
> 1. There is currently a problem with the code generation in
> foreign.rktc.
> As a result of foreign.rktc's code generation, line 1974 is supposed
> to be:
>tmp = (mz_long_double)(SCHEME_MAYBE_LONG_DBL_VAL(val));
> Unfortunately, that does not compile, saying
>error C2440: 'type cast' : cannot convert from 'long_double' to
> 'mz_long_double'
> So we must not cast the structure value type into itself.
> Good news is that simple assignment without casting works fine.
> Probably there is a need for an option in foreign.rktc declaratons
> to turn off casting in this partuicular case. Matthew, it is possible
> to add such an option?
> 
> 2. xform.rkt contains a long list of all long double arithmetic
> [non-]functions.
> 
> 3. In numstr.c I could not avoid separating the parsing of
> double and long double values.
> 
> 
> I have tested the code on windows 7 x86 msvc 2008 (virtualbox).
> x64 has not been tested yet.
> 
> Binary longdouble library can be found here:
> https://github.com/filonenko-mikhail/racket/tree/extflonum-windows/src/racket/s
> rc/longdouble
> 
> --
> With best regards, Michael Filonenko

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] Extflonum type for windows

2013-03-18 Thread Matthew Flatt
At Mon, 4 Mar 2013 19:06:32 +0300, Michael Filonenko wrote:
> The following pull request provides long double type (extflonum) on
> win32: https://github.com/plt/racket/pull/265

Merged --- with some changes, as usual...

> It seems that RacketCGC is supposed to be built without any
> third-party DLLs (longdouble.dll being one of them), so the
> following building process seems natural: [...]

I changed the way that "longdouble.dll" is loaded and linked so that
`extflonum-available?' returns #f if "longdouble.dll" isn't found.
Since extflonums are not needed to build Racket 3m, that solves the
build-order problem.

> 1. There is currently a problem with the code generation in
> foreign.rktc. [...]
> Probably there is a need for an option in foreign.rktc declaratons
> to turn off casting in this partuicular case. Matthew, it is possible
> to add such an option?

Yes, done.

> 2. xform.rkt contains a long list of all long double arithmetic
> [non-]functions.

Instead of changing "xform.rkt", I added XFORM_NONGCING annotations
to the function prototypes.

> 3. In numstr.c I could not avoid separating the parsing of
> double and long double values.

That looks ok.


Another problem is that Windows seems unhappy with changing the
precision mode. The _control87() function apparently ignores an attempt
to change the mode, and when I set the mode using a FLDCW instruction,
some library function resets it back.

After a brief and unsuccessful attempt to track down where the mode is
reset, I changed the DLL and JIT to set the mode just before performing
extflonum arithmetic, and then set it back afterward.

Of course, there can be a cost to changing the mode at such a fine
granularity. When I run the first program below in Mac OS X 64-bit
mode, I get

 'flonum
 cpu time: 483 real time: 483 gc time: 0
 1.0
 cpu time: 474 real time: 474 gc time: 0
 1.0
 cpu time: 789 real time: 787 gc time: 0
 1.0
 'extflonum
 cpu time: 641 real time: 640 gc time: 0
 1.0t0
 cpu time: 885 real time: 884 gc time: 0
 1.0t0
 cpu time: 959 real time: 958 gc time: 0
 1.0t0

but if I force the JIT to set the control word on every extflonum
operation, I get

 
 'extflonum
 cpu time: 806 real time: 806 gc time: 0
 1.0t0
 cpu time: 1054 real time: 1053 gc time: 0
 1.0t0
 cpu time: 959 real time: 957 gc time: 0
 1.0t0

It looks like division is slow enough to mask the overhead of setting
the mode. For addition and subtraction, the overhead seems to be on the
order of the cost of switching flonums to extflonums.

The JIT could be improved to avoid switching between consecutive
operations, but does the cost of this approach look reasonable as a
start?



#lang racket
(require racket/flonum
 racket/extflonum)

'flonum
(time
 (for/fold ([v 1.0]) ([i (in-range 1)])
   (fl- (fl+ v v) v)))
(time
 (for/fold ([v 1.0]) ([i (in-range 1)])
   (fl- (fl+ v v) 1.0)))
(time
 (for/fold ([v 1.0]) ([i (in-range 1)])
   (fl/ (fl* v v) v)))

'extflonum
(time
 (for/fold ([v 1.0t0]) ([i (in-range 1)])
   (extfl- (extfl+ v v) v)))
(time
 (for/fold ([v 1.0t0]) ([i (in-range 1)])
   (extfl- (extfl+ v v) 1.0t0)))
(time
 (for/fold ([v 1.0t0]) ([i (in-range 1)])
   (extfl/ (extfl* v v) v)))

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] Extflonum type for windows

2013-03-18 Thread Neil Toronto

On 03/18/2013 07:53 AM, Matthew Flatt wrote:

The JIT could be improved to avoid switching between consecutive
operations, but does the cost of this approach look reasonable as a
start?


IMO, yes. The only other good options for higher precision are Racket's 
rationals and `math/bigfloat', which are both much slower. Compared to 
those, a 2x penalty for addition and subtraction is nothing.


Neil ⊥

_
 Racket Developers list:
 http://lists.racket-lang.org/dev


Re: [racket-dev] Extflonum type for windows

2013-03-19 Thread Michael Filonenko
Matthew, thank you very much.

It seems that your changes with precision switching on every function
call is not required on 32-bit windows. I have prepared a little pull
request that fixes it. All tests on my VirtualBox machines
(both 32-bit and 64-bit) pass. Our own tests related to ffi pass too.

https://github.com/plt/racket/pull/280

Small note about extflonum ffi on win platforms:

It is possible to use long double on win platforms with gcc (mingw,
mingw-w64) compiler. It is also possible to use compiled DLL with
Racket FFI. The only one note is that long double data must be 16-byte
aligned, therefore you should use gcc command line option
-m128bit-long-double on win32 platform. On win64 platform aligning is
16 byte by default.

Please let me know if I can help further with testing or documentation.

2013/3/18 Matthew Flatt :
> At Mon, 4 Mar 2013 19:06:32 +0300, Michael Filonenko wrote:
>> The following pull request provides long double type (extflonum) on
>> win32: https://github.com/plt/racket/pull/265
>
> Merged --- with some changes, as usual...
>
>> It seems that RacketCGC is supposed to be built without any
>> third-party DLLs (longdouble.dll being one of them), so the
>> following building process seems natural: [...]
>
> I changed the way that "longdouble.dll" is loaded and linked so that
> `extflonum-available?' returns #f if "longdouble.dll" isn't found.
> Since extflonums are not needed to build Racket 3m, that solves the
> build-order problem.
>
>> 1. There is currently a problem with the code generation in
>> foreign.rktc. [...]
>> Probably there is a need for an option in foreign.rktc declaratons
>> to turn off casting in this partuicular case. Matthew, it is possible
>> to add such an option?
>
> Yes, done.
>
>> 2. xform.rkt contains a long list of all long double arithmetic
>> [non-]functions.
>
> Instead of changing "xform.rkt", I added XFORM_NONGCING annotations
> to the function prototypes.
>
>> 3. In numstr.c I could not avoid separating the parsing of
>> double and long double values.
>
> That looks ok.
>
>
> Another problem is that Windows seems unhappy with changing the
> precision mode. The _control87() function apparently ignores an attempt
> to change the mode, and when I set the mode using a FLDCW instruction,
> some library function resets it back.
>
> After a brief and unsuccessful attempt to track down where the mode is
> reset, I changed the DLL and JIT to set the mode just before performing
> extflonum arithmetic, and then set it back afterward.
>
> Of course, there can be a cost to changing the mode at such a fine
> granularity. When I run the first program below in Mac OS X 64-bit
> mode, I get
>
>  'flonum
>  cpu time: 483 real time: 483 gc time: 0
>  1.0
>  cpu time: 474 real time: 474 gc time: 0
>  1.0
>  cpu time: 789 real time: 787 gc time: 0
>  1.0
>  'extflonum
>  cpu time: 641 real time: 640 gc time: 0
>  1.0t0
>  cpu time: 885 real time: 884 gc time: 0
>  1.0t0
>  cpu time: 959 real time: 958 gc time: 0
>  1.0t0
>
> but if I force the JIT to set the control word on every extflonum
> operation, I get
>
>  
>  'extflonum
>  cpu time: 806 real time: 806 gc time: 0
>  1.0t0
>  cpu time: 1054 real time: 1053 gc time: 0
>  1.0t0
>  cpu time: 959 real time: 957 gc time: 0
>  1.0t0
>
> It looks like division is slow enough to mask the overhead of setting
> the mode. For addition and subtraction, the overhead seems to be on the
> order of the cost of switching flonums to extflonums.
>
> The JIT could be improved to avoid switching between consecutive
> operations, but does the cost of this approach look reasonable as a
> start?
>
> 
>
> #lang racket
> (require racket/flonum
>  racket/extflonum)
>
> 'flonum
> (time
>  (for/fold ([v 1.0]) ([i (in-range 1)])
>(fl- (fl+ v v) v)))
> (time
>  (for/fold ([v 1.0]) ([i (in-range 1)])
>(fl- (fl+ v v) 1.0)))
> (time
>  (for/fold ([v 1.0]) ([i (in-range 1)])
>(fl/ (fl* v v) v)))
>
> 'extflonum
> (time
>  (for/fold ([v 1.0t0]) ([i (in-range 1)])
>(extfl- (extfl+ v v) v)))
> (time
>  (for/fold ([v 1.0t0]) ([i (in-range 1)])
>(extfl- (extfl+ v v) 1.0t0)))
> (time
>  (for/fold ([v 1.0t0]) ([i (in-range 1)])
>(extfl/ (extfl* v v) v)))
>
>

--
With best regards, Michael Filonenko
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] Extflonum type for windows

2013-03-19 Thread Matthew Flatt
At Tue, 19 Mar 2013 22:08:41 +0400, Michael Filonenko wrote:
> It seems that your changes with precision switching on every function
> call is not required on 32-bit windows.

The 32-bit Windows build does not use SSE for flonum operations, so
setting the precision at the start to extended mode would affect flonum
arithmetic.

It's possible that we should switch the build to use SSE, but I worry
about dropping support for old processors.

Meanwhile, it happens that switching the precision at the last minute
allows the 32-bit Windows build to support extflonums without using SSE
and without affecting flonum arithmetic.

> Small note about extflonum ffi on win platforms:
> 
> It is possible to use long double on win platforms with gcc (mingw,
> mingw-w64) compiler. It is also possible to use compiled DLL with
> Racket FFI. The only one note is that long double data must be 16-byte
> aligned, therefore you should use gcc command line option
> -m128bit-long-double on win32 platform. On win64 platform aligning is
> 16 byte by default.

I don't understand. Which piece of the system requires that `long
double's are 16-byte aligned? And is this an issue about compiling
"longdouble.dll", or with other potential DLLs?

Thanks!

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] Extflonum type for windows

2013-03-20 Thread Michael Filonenko
> The 32-bit Windows build does not use SSE for flonum operations, so
> setting the precision at the start to extended mode would affect flonum
> arithmetic.

Sorry that I did not take that into account.

> It's possible that we should switch the build to use SSE, but I worry
> about dropping support for old processors.

> Meanwhile, it happens that switching the precision at the last minute
> allows the 32-bit Windows build to support extflonums without using SSE
> and without affecting flonum arithmetic.

Agreed. But since switching the processor "at last minute" every time
slows things down a bit, it may be useful to have an option to
switch to the extended mode on Win32 just once. That will be useful
for us and other users who have SSE and worry about performance.
Could you please take a look on the updated pull request which now
contains #define-guarded code?

There is a caveat that londgouble.dll should be present in two Win32
versions -- with and without the switching. It is no problem for us to
compile a non-switching londgouble.dll on our own, although other
users may be happier if it the dll had both sets of methods ("SSE" and
"non-SSE") and Racket made the decision which to use when loading the
library, how do you think?

>> It is possible to use long double on win platforms with gcc
>> mingw-w64) compiler. It is also possible to use compiled DLL with
>> Racket FFI. The only one note is that long double data must be 16-byte
>> aligned, therefore you should use gcc command line option

> I don't understand. Which piece of the system requires that `long
> double's are 16-byte aligned? And is this an issue about compiling
> "longdouble.dll", or with other potential DLLs?

Never mind, that was my mistake which I made after I forgot to
clean generated racket3m sources in src/worksp/gc2/xsrc and
data transfer became out of sync. Everything is actually OK
by default -- 12-byte-aliged long doubles on Win32, and
16-byte-aligned long doubles on x64.

Updated pull request:
https://github.com/plt/racket/pull/280

2013/3/19 Matthew Flatt :
> At Tue, 19 Mar 2013 22:08:41 +0400, Michael Filonenko wrote:
>> It seems that your changes with precision switching on every function
>> call is not required on 32-bit windows.
>
> The 32-bit Windows build does not use SSE for flonum operations, so
> setting the precision at the start to extended mode would affect flonum
> arithmetic.
>
> It's possible that we should switch the build to use SSE, but I worry
> about dropping support for old processors.
>
> Meanwhile, it happens that switching the precision at the last minute
> allows the 32-bit Windows build to support extflonums without using SSE
> and without affecting flonum arithmetic.
>
>> Small note about extflonum ffi on win platforms:
>>
>> It is possible to use long double on win platforms with gcc (mingw,
>> mingw-w64) compiler. It is also possible to use compiled DLL with
>> Racket FFI. The only one note is that long double data must be 16-byte
>> aligned, therefore you should use gcc command line option
>> -m128bit-long-double on win32 platform. On win64 platform aligning is
>> 16 byte by default.
>
> I don't understand. Which piece of the system requires that `long
> double's are 16-byte aligned? And is this an issue about compiling
> "longdouble.dll", or with other potential DLLs?
>
> Thanks!
>
>



-- 
With best regards, Michael Filonenko
_
  Racket Developers list:
  http://lists.racket-lang.org/dev


Re: [racket-dev] Extflonum type for windows

2013-03-20 Thread Neil Toronto

On 03/20/2013 05:14 AM, Michael Filonenko wrote:

Meanwhile, it happens that switching the precision at the last minute
allows the 32-bit Windows build to support extflonums without using SSE
and without affecting flonum arithmetic.


Agreed. But since switching the processor "at last minute" every time
slows things down a bit, it may be useful to have an option to
switch to the extended mode on Win32 just once. That will be useful
for us and other users who have SSE and worry about performance.
Could you please take a look on the updated pull request which now
contains #define-guarded code?


I expect this to cause the tests in "math/tests/flonum-tests.rkt" to fail.

I think the optimization Matthew mentioned earlier - switching to 
extended mode once for multiple operations - is a better option. The 
blocks of code guarded by mode switches should be roughly the same as 
those that stack-allocate extflonums. IOW, switching modes would mostly 
coincide with moving extflonums in and out of the heap, which is slow 
enough that the mode switches would be unnoticeable.


Neil ⊥

_
 Racket Developers list:
 http://lists.racket-lang.org/dev


Re: [racket-dev] Extflonum type for windows

2013-03-30 Thread Matthew Flatt
Sorry for the long delay!

At Wed, 20 Mar 2013 16:14:46 +0400, Michael Filonenko wrote:
> Agreed. But since switching the processor "at last minute" every time
> slows things down a bit, it may be useful to have an option to
> switch to the extended mode on Win32 just once. That will be useful
> for us and other users who have SSE and worry about performance.

I worry that system libraries or other libraries may somehow rely on
double precision, and so setting the mode globally to extended
precision may cause problems --- independent of whether the rest of
Racket uses SSE for double-precision arithmetic.

But many computations are likely to work anyway, so let's set that
concern aside for the purposes of someone who wants try extended
precision by default...


> There is a caveat that londgouble.dll should be present in two Win32
> versions -- with and without the switching.

I've changed "longdouble.dll" so that it switches the floating-point
mode and then restores it on each call, instead of always setting the
mode back to double precision after each call. That makes the DLL work
consistently with different default modes.

(Although the DLL's overhead could be lower if the context is known to
be in extended-precision mode already, I think the overhead of getting
into the DLL is so large that skipping the no-op mode switching
wouldn't matter.)


Now, you can adjust the 32-bit MSVC build so that MZ_NEED_SET_EXTFL_MODE
is not defined in "sconfig.h", and then if you add initialization to
set the x87 control word to extended precision, maybe extflonums would
work right.

Further, to avoid affecting flonums, you could get the JIT to use SSE
by defining MZ_USE_JIT_SSE. That flag doesn't cover MSVC-generated
`double' arithmetic, so you'd also have to tell MSVC to use SSE for
floating-point math. Then again, MSVC doesn't seem to have a way to say
"use only SSE"; there's a flag to enable SSE, but the compiler reserves
the right to mix SSE and x87, depending on what it thinks will be
faster.


Another possible direction is to use MinGW for a faster-extflonum
build. I've pushed some repairs so that Racket again builds with MinGW
(only extflonum-related updates were needed), and a 32-bit build using
CPPFLAGS="-mpentium4 -mfpmath=sse" provides extflonums without
last-minute switching. A 64-bit MinGW build by default uses SEE, so it
also provides extflonums without switching.

Note that a MinGW build no longer has the same behavior as an MSVC
build when SSE arithmetic is enabled for gcc, since SSE mode makes
Racket add initialization of the default floating-point mode to
extended to support extflonums. If we ever want to bring the two builds
back in sync, we can add an option. For now, though, the MinGW build
just provides a path for experimenting with floating-point modes.

_
  Racket Developers list:
  http://lists.racket-lang.org/dev