Thanks! I can work with this, though I'd like to start merging after the next release branch on January 7.
Much of the cut-and-paste is difficult to abstract over, but I worry about the amount of cut-and-paste in the JIT. The inlined arithmetic functions, like scheme_generate_arith(), seem like too much to duplicate. I'm willing to work on replacing cut-and-paste with abstraction when I merge the changes, but anything you can do to reduce the cut-and-paste would be appreciated. I don't think we should use "l" in literal numbers to mean extfl, because those literals are currently numbers. In other words, using "l" for extfls would be a backward-incompatible change. I think we may need to leave "l" as `double' and pick a different letter to indicate `long double'. At Sat, 22 Dec 2012 20:24:15 +0300, Michael Filonenko wrote: > Hi all. > > Modern FPUs can accelerate three types of floating-point arithmetic: > single (32 bit), double (64 bit), long double (80 bit). > > Currently Racket supports single and double precisions (flonum) and > is able to JIT operations on them. > > The task that is currently being done is adding long double type > (hereinafter extflonum) arithmetic into racket, along with the > corresponding vector type (hereinafter extflvector). > > Here we go: > > "Long double" requires modification of three racket parts: > - C core; > - JIT; > - Racket library. > > Also, long double arithmetic requires setting "extended mode" flag on > FPU, which forces the FPU to use 80-bit registers. The side effect on > that flag is that the FPU gives slightly different (more accurate, but > not IEEE-compliant) results for 64-bit operations. That is usually > not a problem on machines who have SSE2 (introduced in Pentium 4 in > 2001). In presense of SSE2, Racket performs 64-bit operations solely > on 64-bit SSE2 registers (see MZ_USE_JIT_SSE and --mfpmath=sse), so the > results are IEEE-compliant. 80-bit operations are done on FPU anyway > as SSE2 can not do them. Therefore, by setting the "extended mode" on > FPU, we introduce a subtle difference in ordinary flonums, but only on > old machines that do not have SSE2. Also, on PowerPC machines > the whole thing with extflonums will be canceled because they > do not have 80-bit registers. > > As for the C core: extended floating-point arithmetic is supported by > gcc compiler on Linux, so build scripts for Linux does not require any > change. Windows is another story. MSVC, commonly used to build Racket > for Windows, does not support anything besides double precision. So we > are forced to use gcc for Windows build, too. Cygwin's gcc is not a > good option for us, because it denies the opportunity to use standard > Windows GUI libraries etc. The other options are mingw (32-bit only) > and mingw-w64 (both 32 and 64 bit). Many thanks to Matthew Flatt for > his effort to port Racket to mingw. > (Yet another option is Intel compiler, but I have not looked into it yet.) > > Extflonums are tested on Linux x86_64, and Windows 7 x86 (VirtualBox). > > I try to keep my modifications separate from other code. That requires > much copy-paste, but hopefully makes my code easier to understand. > > === Miscellaneous notes: > > * Extflonum has text representation with "l" suffix > (similar to "f" suffix for single flonums). > > * Extflonum is not integrated into existing racket arithmetic (so, (+ > 123.0l0 513.0l0) is not possible). Extflonums have their own set of > functions: extfl+, extfl-, extfl*, unsafe-extfl+, unsafe-extfl-, etc > (similarly to flonums). The only Racket functions that were modified > are reader, printer, and "equal?" (see below). > > * The macro that guards the extflonum code is MZ_LONG_DOUBLE. > The config definition is MZ_USE_LONG_DOUBLE, which enables MZ_LONG_DOUBLE. > The configuration scripts were not modified. > > * The macros MZ_LONG_DOUBLE_DISABLED and USE_EXTFLONUM_UNBOXING should be > undefined, these are for unbox optimization, which will be in future. > > Changes: > > * C core was extended with following types and constants: > C structs: > Scheme_Long_Double (extflonum) > Scheme_Long_Double_Vector (extflvector) > constants: > scheme_long_double_type > scheme_extflvector_type > > * Racket reader and printer were modified for reading extflonums (with > suffix "l0"). Racket printer was modified for printing extflvectors > (with "#extfl" prefix). Racket "equal?" function was modified to > support extflonums (the purpose of doing that is that I needed > rackunit to work with extflonums). > > * xform compiler was extended for handling long double functions, > such as cosl, sinl, floorl, etc. > > * GNU lightning was extended with explicit jit fpu operations: fp-extfpu.h > > * Racket collections was extended with racket/extflonum.rkt module, > which exports both safe and unsafe functions for extflonums and > extflvectors. > > === Notes on JIT changes > > JIT contains two optimization for flonums. > First is compiling racket code with inlined flonum functions. > Second is unboxing flonums to temporary storage when it > is possible, avoiding overhead with Scheme_Double object. > > I have added only the first optimization for extflonum, by > copy-pasting and modifying the original flonum > code. Unboxing extflonums is not implemented yet. > > long double is aligned on 12 bytes on x86 and on 16 bytes on x86_64. > That is important for vector accessors generated by JIT for extflvectors. > It is implemented by the following code > > #ifdef MZ_LONG_DOUBLE > # ifdef MZ_USE_JIT_X86_64 > # define JIT_LOG_LONG_DOUBLE_SIZE 4 > # define JIT_LONG_DOUBLE_SIZE (1 << JIT_LOG_LONG_DOUBLE_SIZE) > # else > # define JIT_LOG_LONG_DOUBLE_SIZE not_implemented > # define JIT_LONG_DOUBLE_SIZE 12 > #endif > > So that the jit code generation is: > > #ifdef MZ_USE_JIT_X86_64 > jit_lshi_ul(JIT_V1, JIT_V1, JIT_LOG_LONG_DOUBLE_SIZE); > #else > jit_muli_ui(JIT_V1, JIT_V1, JIT_LONG_DOUBLE_SIZE); > #endif > > JIT sometimes retains flonum into special buffer. > I use it in the following not nice way: > > #ifdef MZ_LONG_DOUBLE > long double *scheme_mz_retain_long_double(mz_jit_state *jitter, long > double ld) > { > /* TODO dirty hack to save long double into two cells of double */ > void *p; > if (jitter->retain_start) > memcpy(&jitter->retain_double_start[jitter->retained_double], &ld, > sizeof(long double)); > p = jitter->retain_double_start + jitter->retained_double; > jitter->retained_double++; > jitter->retained_double++; > return p; > } > #endif > > I have modified the place where flonum is unboxed from Scheme_Double > structure. I used jitter->unbox_extflonum flag for this. > > #ifdef MZ_LONG_DOUBLE > if (jitter->unbox_extflonum) { > fpr0 = JIT_FPU_FPR_0(jitter->unbox_depth); > jit_fpu_ldxi_ld_fppush(fpr0, target, > &((Scheme_Long_Double*)0x0)->long_double_val); > jitter->unbox_depth++; > } else > #endif > { > fpr0 = JIT_FPR_0(jitter->unbox_depth); > jit_ldxi_d_fppush(fpr0, target, &((Scheme_Double *)0x0)->double_val); > jitter->unbox_depth++; > } > > === Tests > > There are tests for extflonums and extflvectors, done by > copy-pasting the tests for flonums and flvectors: > > collects/racket/extflonum.rkt > collects/tests/racket/extfl-unsafe.rktl > > The patch is attached, I think it can be applied to the development > branch of Racket. > > === Documentation > > There is documentation in file > collects/scribblings/reference/extflonums.scrbl, > which contains mostly copypaste from flonum.scrbl with small > adaptation for extflonum. _________________________ Racket Developers list: http://lists.racket-lang.org/dev