https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86026
Bug ID: 86026 Summary: Document and/or change allowed operations on integer representation of pointers Product: gcc Version: 8.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: pascal_cuoq at hotmail dot com Target Milestone: --- This report is about GCC's handling of low-level pointer tricks such as the XOR linked list [1]. Uses of this sort of trick are present in legacy C code and in newly written code that push the boundaries of what is possible. If users of GCC are going to apply GCC to this sort of code, they need to know what is allowed. This relates to the discussion about pointer provenance, for which Kayvan Memarian and Peter Sewell wrote a discussion draft in 2016 [3]. Consider the three functions below: char f_ptradd(ptrdiff_t o) { g = 1; *(t + o) = 2; return g; } char f_intadd(ptrdiff_t o) { g = 1; *(char*)((uintptr_t)t + o) = 2; return g; } char f_intxor(ptrdiff_t o) { g = 1; *(char*)((uintptr_t)t ^ o) = 2; return g; } GCC 8.1 produces the following assembly for these functions [2]: f_ptradd: movb $1, g(%rip) movl $1, %eax movb $2, t(%rdi) ret f_intadd: movb $1, g(%rip) movl $1, %eax movb $2, t(%rdi) ret f_intxor: xorq $t, %rdi movb $1, g(%rip) movb $2, (%rdi) movzbl g(%rip), %eax ret The third function does exactly what a XOR linked list implementation does. Sufficiently smart inlining and constant propagation would turn a generic implementation into f_intxor. GCC 8.1 and earlier versions compile the first two functions f_ptradd and f_intadd as if they could never be invoked as in the following main: int main(void) { f_ptradd((uintptr_t) &g - (uintptr_t) t); f_intadd((uintptr_t) &g - (uintptr_t) t); f_intxor((uintptr_t) &g ^ (uintptr_t) t); } This is fair in the case of f_ptradd, which invokes undefined behavior at “t + o”. However, I would have expected f_intadd to be compiled conservatively, because it is possible to pass a value o that, added to the integer representation of (char*)t, produces the integer representation of g. Of course, it is for the GCC developers to decide exactly what is allowed and what isn't after a pointer is converted to an integer. But without an explicit explanation in GCC's documentation of what makes f_intadd and f_intxor different, we have to assume that the latter is no more supported than the former, and that programs that use XOR linked lists or similar data structures cannot be compiled by GCC. [1] https://en.wikipedia.org/wiki/XOR_linked_list [2] https://godbolt.org/g/DYCpjS [3] http://www.cl.cam.ac.uk/~pes20/cerberus/n2090.html