Hi :) A few points of information :)
On Fri 05 Jun 2020 22:50, Ludovic Courtès <l...@gnu.org> writes: > [Sorting] the ELF sections of a .go file by size; for ‘python-xyz.go’, > I get this: > > $13 = ((".rtl-text" . 3417108) > (".guile.arities" . 1358536) > (".data" . 586912) > (".rodata" . 361599) > (".symtab" . 117000) > (".debug_line" . 97342) > (".debug_info" . 54519) > (".guile.frame-maps" . 47114) > ("" . 1344) > (".guile.arities.strtab" . 681) > ("" . 232) > (".shstrtab" . 229) > (".dynamic" . 112) > (".debug_str" . 87) > (".strtab" . 75) > (".debug_abbrev" . 65) > (".guile.docstrs.strtab" . 1) > ("" . 0) > (".guile.procprops" . 0) > (".guile.docstrs" . 0) > (".debug_loc" . 0)) > > More than half of those 6 MiB is code, and more than 1 MiB is > “.guile.arities” (info "(guile) Object File Format"), which is > surprisingly large; presumably the file only contains thunks (the > ‘thunked’ fields of <package>). The guile.arities section starts with a sorted array of fixed-size headers, then is followed by a sequence of ULEB128 references to local variable names, including non-arguments. The size is a bit perplexing, I agree. I can think of a number of ways to encode that section differently but we'd need to understand a bit more about it and why the baseline compiler is significantly different. > Stripping the .debug_* sections (if that works) clearly wouldn’t help. I believe that it should eventually be possible to strip guile.arities, fwiw. > So I guess we could generate less code (reduce ‘.rtl-text’), perhaps by > tweaking ‘define-record-type*’, but I have little hope there. Hehe :) As you mention later: > With 3.0.3-to-be and -O1, python-xyz.go weighs in at 3.4 MiB instead of > 5.9 MiB! Here’s the section size distribution: > > $4 = ((".rtl-text" . 2101168) > (".data" . 586392) > (".rodata" . 360703) > (".guile.arities" . 193106) > (".symtab" . 117000) > (".debug_line" . 76685) > (".debug_info" . 53513) > ("" . 1280) > (".guile.arities.strtab" . 517) > ("" . 232) > (".shstrtab" . 211) > (".dynamic" . 96) > (".debug_str" . 87) > (".strtab" . 75) > (".debug_abbrev" . 56) > (".guile.docstrs.strtab" . 1) > ("" . 0) > (".guile.procprops" . 0) > (".guile.docstrs" . 0) > (".debug_loc" . 0)) > scheme@(guile-user)> (stat:size (stat go)) > $5 = 3519323 > > “.rtl-text” is 38% smaller and “.guile.arities” is almost a tenth of > what it was. The difference in the text are the new baseline intrinsics, e.g. $vector-ref. It goes in the opposite direction from instruction explosion, which sought to (1) make the JIT compiler easier by decomposing compound operations into their atomic parts, (2) make the optimizer learn more information from flow rather than type-checking side effects, and (3) allow the optimizer to eliminate / hoist / move the component pieces of macro-operations. However in the baseline compiler (2) and (3) aren't possible because there is no optimizer on that level, and therefore the result is actually a lose -- 10 micro-ops cost more than 1 macro-op because of stack traffic overhead, which isn't currently mitigated by the JIT (1). So instruction explosion is residual code explosion, which should pay off in theory, but not for the baseline compiler. So I added new intrinsics for e.g. $vector-ref et al. Thus the smaller code size. I am not sure what causes the significantly different .guile.arities size! > Something’s going on here! Thoughts? There are more possibilities for making code size smaller, e.g. having two equivalent encodings for bytecode, where one is smaller: https://webkit.org/blog/9329/a-new-bytecode-format-for-javascriptcore/ Or it could be that if we could do register allocation for a target-dependent fixed set of registers in bytecode already, that could decrease minimum instruction size, making more instructions fit into single 32-bit words. Would be nice if the JIT could rely on the bytecode compiler to already have done register allocation, and reify corresponding debug information. Just a thought though, and not really appropriate to the baseline compiler. Cheers, Andy