Hi, In the last update (https://lists.gnu.org/archive/html/guile-devel/2017-11/msg00010.html) I talked a lot about fixnums. This last couple weeks' work is along the same lines.
Firstly I added support for specialized instructions that compare (in the < or = sense) s64 and u64 values against 8-bit immediates. Then I started to do some "exploding" of the comparison operators: for example, if a number is a 64-bit integer, maybe first see if it's a fixnum and take a fast path there, otherwise call out to a scm->u64 primitive. I was thinking maybe we would be able to inline even scm->u64 entirely, but this turned out to be a bad idea, and indeed even the speculative inlining of the fixnum case of e.g. < wasn't that great. The problem is that if you replace a simple primcall (define x FOO) with a diamond control structure like (define x (if FIXNUM? FOO-1 FOO-2)) early in the compiler, that makes it harder to do code motion like CSE or LICM on X, as X now has two definitions. It could make sense to reify a diamond control flow more towards the compiler back-end, but not in the beginning. Instruction explosion only makes sense if some of the exploded instructions can be eliminated or are themselves subject to CSE. If it's just exploding instructions that can never be eliminated, probably it's better to call out to the runtime (as in the case of scm->u64, unless it can be reduced to untag-fixnum). By this time I had backed myself into a little corner with number specialization, so I had to back out and fix my mess; took a week or so, but then I was back on track, with a new idea. The problem I was looking to solve was to hoist a fixnum? check above code like (= x (logand x #xff)). This predicate will only succeed if X is a fixnum. I then realized it would be tricky, as it would involve hoisting the assertion made inside logand that X is an exact integer "above" (but to where?), and it could be that the assertion doesn't commute with other effects so it can't be hoisted. So I modified the problem :) Instead I wanted to optimize this kind of code: (define (t x) (unless (exact-integer? x) (error "expected an integer" x)) (unless (= x (logand x #xff) (error "out of range" x)) (* x 2)) or (define (t x) (unless (exact-integer? x) (error "expected an integer" x)) (unless (<= -10 x 100) (error "out of range" x)) (* x 2)) Here exact-integer? is effectively (or (fixnum? x) (bignum? x)), so we have a more precise type of X flowing into the predicate. However it's not magical; the "=" in the first example still sees that X could be a bignum, and likewise for the <= operations. What you want to do here is to "devirtualize" part of this function. This is a compiler pass that effectively inlines the concrete implementations of a virtual operation like =, logand, or <=. But you don't want to devirtualize just one operation; you want to devirtualize a trace of operations. Like this: (define (out-of-range x) (error "out of range" x)) (define (not-int x) (error "expected an integer" x)) (cond ((fixnum? x) (if (<= -10 x 100) (* x 2) (out-of-range x))) ((bignum? x) (if (<= -10 x 100) (* x 2) (out-of-range x))) (else (not-int x))) This causes code growth initially, but probably causes shrinkage later; concretely this code optimizes to: (define (out-of-range x) (error "out of range" x)) (define (not-int x) (error "expected an integer" x)) (cond ((fixnum? x) (let ((x* (untag-fixnum x))) (if (s64<= -10 x*) (if (s64<= x* 100) (tag-fixnum (s64+ x* x*)) (out-of-range x)) (out-of-range x)))) ((bignum? x) (out-of-range x) (else (error "expected an integer" x))) which is indeed what we get in code: Disassembly of #<procedure t (x)> at #xb437a0: 0 (assert-nargs-ee/locals 2 0) ;; 2 slots (1 arg) at (unknown file):10:0 1 (immediate-tag=? 0 3 2) ;; fixnum? at (unknown file):11:10 3 (jne 11) ;; -> L1 4 (untag-fixnum 1 0) at (unknown file):12:10 5 (s64-imm<? 1 -10) 6 (jl 14) ;; -> L2 7 (imm-s64<? 1 100) 8 (jl 12) ;; -> L2 9 (mov 0 1) at (unknown file):13:2 10 (uadd 1 0 1) 11 (tag-fixnum 0 1) 12 (handle-interrupts) 13 (return-values 2) ;; 1 value L1: 14 (immediate-tag=? 0 7 0) ;; heap-object? at (unknown file):11:10 16 (jne 8) ;; -> L3 17 (heap-tag=? 0 4095 279) ;; bignum? 19 (jne 5) ;; -> L3 L2: 20 (throw/value 0 138) ;; #(misc-error #f "out of range ~S") at (unknown file):12:25 22 (handle-interrupts) 23 (return-values 1) ;; 0 values L3: 24 (throw/value 0 150) ;; #(misc-error #f "expected an integer ~S") at (unknown file):11:29 26 (handle-interrupts) 27 (return-values 1) ;; 0 values Pretty good code I think, though I should probably improve the disassembler to make more readable disassemblies. I would be interested if anyone knows if other compilers do this. It would appear that GCC and LLVM do not: https://www.godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAKxAEZSBnVAV2OUxAHIBSAJgGY8AO2QAbZlgDU3fgGEGBfEIIA6BDOzcADAEFtOhcWbICkgEZ5gQ5gFsZAIX37mQvKiGThBTMEzFpAOyOupKhYcxeAA4ExAD6pgBmeAAe1nb8wTph2ZKGxqYWVraSAFSFaQ5OAQAilbrOru6eyj5%2B5pZpsQCG6Oi0khAubh5ercQAlHU6Q02jvv5RMfHxqLHlxRCLcQSTGU66Cl0EeMjNosKYkhHKABzxngyxSakbMyMt856T9UH6OZLETAEVgeTYqZ5paS8ABsklo42k/FqSLhU24NX2BgIRxOZwuV0a728ny2yyITxSkM2ymi20kyW%2Bel%2BIRyb2axLagNMMmqgXsknBlOKPIGUGuBFp8QRDMRshksjhCPRCv66ORmX%2BoS5IIBQLRNVCmMOx1OwnOQkubLmbVJBBWsWt/mpEqWpgZVQ1OTwCVFYtt0qV/E0Qck0N4Sphir%2Bmq1QJ1tvt6xsA3dexZMdCmFEDEw0YzOW1xCJkrt5IhG1TmTVRuxJrxFvZJck5Zs9ttAytHxtjPRnuyhdBzpLCLwgpeNkDwewqLTTNq9QOtdxZvxnY5/hb3V6/UGhPZYy%2BHrzYQA9CfJAB1S4AayEqAA7pIekpgHDJPf3GBOKZUAA3PwJKID6SAQCCXLaKjHv2cZFo2rptl2TqbuS7abBG/Lwvq856IuOKmkI5qWnujpPtuHbEYhh4/H2YQDg8FLjgMeCBgAYs2Qqts%2BO7MZIIDtEUnFkWhWGYmuB7PgALORwz7iSPbMlkBYwR4XEDKpEDqepzHjDpWGcOMpCiFwACsnCkEIXBaGZqBcHKvCOPZuQsGwlx8PwtBmQQln6QZ14gBJACcKgBSFoVhWFhlcBJZkWZwVmkDZnBmQwIBaKQXlxfppBwLASBoDYkR4KIfjkJQ%2BWFcVxAoKIXRCMAxlaGlSSiN4xApRAZjeaQFhCF0xAAJ5cB5pD5TYmDKAA8gRg2ZaQWA2LVwDFV1%2BCAiYeD/ils2YMkmDIMw3hDWZoxGbNMR4DY3kGecZgpZABmoNEwxbQAtBN/DJc57B0NdJkxV1iXJDc0IvdCUliItkjGSoWgwwMuCECQULuaQkiyKgBVFW0bnwp5V2%2Bf5QXhcToWRZw0XmQDXDJal6X42TvD/bNiV45lOmkP%2BbXDP5QA I.e. each individual fixnum op is inlined, but GCC and LLVM don't peel out a trace for the fixnum case. I call this pass "integer devirtualization" but also use the term "peeling a trace". Words! So that's my status. Next up, I need to fix <= and >= on NaN values, as I didn't get to that from last time; then I will move on to instruction explosion of vector-ref and related instructions. But first, I promised Ludovic I would fix the compiler speed issue in slot allocation, both for 2.2 and 3.0, so I will do that :) Cheers, Andy