https://sourceware.org/bugzilla/show_bug.cgi?id=31964
Bug ID: 31964 Summary: Add directive for more efficient encoding of binary data Product: binutils Version: unspecified Status: NEW Severity: normal Priority: P2 Component: gas Assignee: unassigned at sourceware dot org Reporter: jakub at redhat dot com Target Milestone: --- GCC right now when it emits binary data into assembler, whether it is for LTO sections with -flto or large variable initializers, uses either the .ascii/.string directives like .string "\304\2347a@\355\004\302tL\302\\\260O>6D\347\266\2527`\200\355\004\276\276L\302j\330'\0279D\347\262\2347`\326v\002_%&a-\354S\017\017\3219i\365\006\214\250\235@\027\324$,\205}\352\271!:G\345\274\331\"l'\020\345f\362U\310>\341\350\020\225+\254\336l\305\265\023\210z1\371\002d\237nz\210\312\3759o\266\216\332\t<EM\276\350\330\247\033\034\242r|\325\233\255\234v\002M{&_j\354\223\315\016Q\271;\347\215V];\201&\035\223\2572\366\311\306\206\250\034\\\365FK\251\235\300\322\227\311W\225}\252\311!*\367\346\274\321\332i'\260Tb\362\025\305>\321\360\020\225S\253\336d\345\265\023H\202\232|1\261O47D\345\320\2347YN\355\004\220\334L\276\202\330\247\031\035\242ra\325\233\254\236v\002H/&_=\354\323\263\207\250\334\226\363\246\312\327N" .ascii "\0325\371\242a\237\2368D\345\252\325\233*[;\201\242=\223" .string "/\026\366\211\335!*'u\336T\201\332\t\024\351\230|\241\260O\254\r\321\270\303\352\215\025`;\001\234/\223.B\366\251%\207h\\\241\363\306\312\332N" .ascii "\247\304\244\013\217}b\341!\032\347W\275\261\"j'`\0035\351\232" .ascii "c\237Xn\210\306\3619o\244\204\355\004h\334L\272\322\330\247\025" or emits it as a sequence of .byte directives .byte 127 .byte 69 .byte 76 .byte 70 .byte 2 .byte 1 .byte 1 .byte 3 .byte 0 .byte 0 For ASCII or mostly ASCII data .string/.ascii are just fine, with 1 or slightly more than 1 assembly character per data section byte, but as can be seen above, for non-ASCII values that is 4 characters per byte or in the .byte sequence case up to 11 characters for byte. I've been wondering whether gas couldn't add a .base64 directive, base64 encoding/decoding is pretty fast thing and can be implemented in a few lines of C or C++ code efficiently. It is something I'm also proposing for #embed preprocessing. Perhaps .base64 argument could be a string, like: .base64 "RUxGAgEBAwAAAAAAAAAAAgA+AAEAAABQ00AAAAAAAEAAAAAAAAAA2CBLEAAAAAAAAAAAQAA4AA4A" .base64 "QAAsACsABgAAAAQAAABAAAAAAAAAAEAAQAAAAAAAQABAAAAAAAAQAwAAAAAAABADAAAAAAAACAAA" .base64 "AAAAAAAAAAAAAAAAAAA=" https://datatracker.ietf.org/doc/html/rfc4648#section-4 I'd probably not add any requirements on the line (string) length, not accept any line breaks nor other characters other than [A-Za-z0-9+/=], just require the string has multiple of 4 characters and so is a valid base64 on its own, with at most = or == at the end, no other = chars. So even .base64 "RUxGAgEBAwAAAAAAAAAAAgA+AAEAAABQ00AAAAAAAEAAAAAAAAAA2CBLEAAAAAAAAAAAQAA4AA4AQAAsACsABgAAAAQAAABAAAAAAAAAAEAAQAAAAAAAQABAAAAAAAAQAwAAAAAAABADAAAAAAAACAAAAAAAAAAAAAAAAAAAAAA=" etc. would be valid. -- You are receiving this mail because: You are on the CC list for the bug.