https://sourceware.org/bugzilla/show_bug.cgi?id=31964

            Bug ID: 31964
           Summary: Add directive for more efficient encoding of binary
                    data
           Product: binutils
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: gas
          Assignee: unassigned at sourceware dot org
          Reporter: jakub at redhat dot com
  Target Milestone: ---

GCC right now when it emits binary data into assembler, whether it is for LTO
sections with -flto or large variable initializers, uses either the
.ascii/.string directives like
        .string
"\304\2347a@\355\004\302tL\302\\\260O>6D\347\266\2527`\200\355\004\276\276L\302j\330'\0279D\347\262\2347`\326v\002_%&a-\354S\017\017\3219i\365\006\214\250\235@\027\324$,\205}\352\271!:G\345\274\331\"l'\020\345f\362U\310>\341\350\020\225+\254\336l\305\265\023\210z1\371\002d\237nz\210\312\3759o\266\216\332\t<EM\276\350\330\247\033\034\242r|\325\233\255\234v\002M{&_j\354\223\315\016Q\271;\347\215V];\201&\035\223\2572\366\311\306\206\250\034\\\365FK\251\235\300\322\227\311W\225}\252\311!*\367\346\274\321\332i'\260Tb\362\025\305>\321\360\020\225S\253\336d\345\265\023H\202\232|1\261O47D\345\320\2347YN\355\004\220\334L\276\202\330\247\031\035\242ra\325\233\254\236v\002H/&_=\354\323\263\207\250\334\226\363\246\312\327N"
        .ascii  "\0325\371\242a\237\2368D\345\252\325\233*[;\201\242=\223"
        .string
"/\026\366\211\335!*'u\336T\201\332\t\024\351\230|\241\260O\254\r\321\270\303\352\215\025`;\001\234/\223.B\366\251%\207h\\\241\363\306\312\332N"
        .ascii 
"\247\304\244\013\217}b\341!\032\347W\275\261\"j'`\0035\351\232"
        .ascii 
"c\237Xn\210\306\3619o\244\204\355\004h\334L\272\322\330\247\025"
or emits it as a sequence of .byte directives
        .byte   127
        .byte   69
        .byte   76
        .byte   70
        .byte   2
        .byte   1
        .byte   1
        .byte   3
        .byte   0
        .byte   0
For ASCII or mostly ASCII data .string/.ascii are just fine, with 1 or slightly
more than 1 assembly character per data section byte, but as can be seen
above, for non-ASCII values that is 4 characters per byte or in the .byte
sequence case up to 11 characters for byte.

I've been wondering whether gas couldn't add a .base64 directive, base64
encoding/decoding is pretty fast thing and can be implemented in a few lines of
C or C++ code
efficiently.  It is something I'm also proposing for #embed preprocessing.
Perhaps .base64 argument could be a string, like:
        .base64
"RUxGAgEBAwAAAAAAAAAAAgA+AAEAAABQ00AAAAAAAEAAAAAAAAAA2CBLEAAAAAAAAAAAQAA4AA4A"
        .base64
"QAAsACsABgAAAAQAAABAAAAAAAAAAEAAQAAAAAAAQABAAAAAAAAQAwAAAAAAABADAAAAAAAACAAA"
        .base64 "AAAAAAAAAAAAAAAAAAA="
https://datatracker.ietf.org/doc/html/rfc4648#section-4
I'd probably not add any requirements on the line (string) length, not accept
any line breaks
nor other characters other than [A-Za-z0-9+/=], just require the string has
multiple of 4 characters and so is a valid base64 on its own, with at most = or
== at the end, no other = chars.
So even
        .base64
"RUxGAgEBAwAAAAAAAAAAAgA+AAEAAABQ00AAAAAAAEAAAAAAAAAA2CBLEAAAAAAAAAAAQAA4AA4AQAAsACsABgAAAAQAAABAAAAAAAAAAEAAQAAAAAAAQABAAAAAAAAQAwAAAAAAABADAAAAAAAACAAAAAAAAAAAAAAAAAAAAAA="
etc. would be valid.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Reply via email to