>Looking at the disassembly of -O1 code in a quest for more concise
>bytecode¹ (a quest that’s not necessarily always relevant but probably
I>s at -O1), I noticed a few things:
> 1. Code for free variable lookup, emitted by
> ‘emit-cached-toplevel-box’, is too large (~7 instructions per
> variable) for little in return.
> [...]
> As for #1, I’m not sure what the best option is. I initially thought
> about adding a new macro-instruction, but then we’d lose on cache-hit
> path, which is not good.
Is this (Guile) Scheme indirection useful? IIRC/IIUC, ELF doesn’t need much
special instructions and instead has what it calls ‘relocations’, which as I
understand it has fairly minimal overhead and as such I wouldn’t expect it to
benefit much from caching(*). Perhaps something akin to relocations in ELF
could be both performant and compact.
(*) besides the lazy relocation when not doing early binding (not sure if I got
the right terminology, has been a while)
> 2. The ‘.data’ section is surprisingly large: for each symbol in the
> source, we end up in that section with a string, a stringbuf
> (pointing to contents in the ‘.rodata’ section), and a symbol.
> More on that below.
I have heard that ELF is quite flexible. Perhaps it would be possible to let
‘stringbuf’ (I’m not familiar with that word) point to the string in the symbol
table (where “string” = “insert-procedure-name-here” and ‘symbol table’ = ELF’s
mapping from strings to procedures/variable values), eliminating the duplicate
that’s (IIUC) currently in .rodata?
3. ‘*lcm-page-size*’ is set to 64 KiB for the purposes of reducing the
number of .go variants needed under prebuilt/.
Should we default to sysconf(_SC_PAGESIZE) and use that common
denominator only when building .go files under prebuilt/ (this
requires adding a compiler flag to choose a different alignment)?
On using _SC_PAGESIZE: that would be non-deterministic IIUC. Some architectures
support multiple page sizes and as such the page size can depend on kernel
configuration (I don’t know if sysconf(__SC_PAGESIZE) reports the current page
size or a common divisor of
possible page sizes). (I recall reading something like that on lwn.net
somewhere, but I don’t know is sysconf(__SC_PAGESIZE) itself is
non-deterministic(*).)
(*) in the reproducible builds sense
Given that ‘--target’ exists, no compiler flag for choosing a different
alignment is necessary. It could perhaps useful, but I don’t see a necessity
(supposedly larger page sizes can be more performant, at least if all of it is
actually utilized, which doesn’t appear to be the case here.)
(In the meantime, I changed the linker to create sparse files in
commit 112b617f5921c67b4b2c45aae39f54cccd34d7ef.)
For reproducibility of produced tar files, if it hasn’t been done already, I
recommend adding whatever’s the tar option for sparsifying files (and also
recording them sparsified) (or, alternatively, for recording them non-sparse,
compression can easily take care of the many zeroes)
Also, a fourth option: many .go/modules come in groups – if you use one module
from the group, then you (possibly indirectly) likely use most of the others in
the group as well. As such, it may be worthwhile to stuff multiply modules in a
single .go. I imagine that would cut down on some duplication with strings (and
also perhaps give the optimiser more opportunities with deduplication and
inlining?). Perhaps it would be worthwhile to stuff all the web stuff together
in a group, the compiler stuff (minus esoteric things like brainfuck) together,
...
Doesn’t even need any compiler changes if you are willing to do things a little
manually, just compile
(begin (include “module0.scm”) (include “module1.scm”) ...)
to “module0.go” and let “module1.go”, “moddule2.go”, ... be a symlink to
“module0.go”
Some care required for targets not supporting symlinks, but making fake
symlinks as regular files recognised by module loading code (or on lower level,
whatever) should be straightforward.
Best regards,
Maxime Devos.