Thanks for the answer, and sorry for slow follow-up. Got distracted by other things...
Jeff Law <l...@redhat.com> writes: > On Sat, 2020-01-25 at 09:31 +0000, Richard Sandiford wrote: >> TL;DR: if we have two bare SYMBOL_REFs X and Y, neither of which have an >> associated source-level decl and neither of which are in an anchor block: >> >> (Q1) can a valid byte access at X+C alias a valid byte access at Y+C? >> >> (Q2) can a valid byte access at X+C1 alias a valid byte access at Y+C2, >> C1 != C2? >> >> Also: >> >> (Q3) If X has a source-level decl and Y doesn't, and neither of them are >> in an anchor block, can valid accesses based on X alias valid accesses >> based on Y? > So what are the cases where Y won't have a source level decl but we > have a decl in RTL? anchors, other cases? Not really sure why I wrote "source-level" TBH. I was really talking about any symbol that has a SYMBOL_REF_DECL. I think there are three "interesting" cases: - symbols with a SYMBOL_REF_DECL - anchor symbols - bare symbols (i.e. everything else) Bare symbols are hopefully rare these days. >> (well, OK, that wasn't too short either...) > I would have thought the answer would be "no" across the board. But > the code clearly indicates otherwise. > > Interposition clearly complicates things as do explicit aliases though. > >> This part seems obvious enough. But then, apart from the special case of >> forced address alignment, we use an offset-based check even for cmp==-1: >> >> /* Assume a potential overlap for symbolic addresses that went >> through alignment adjustments (i.e., that have negative >> sizes), because we can't know how far they are from each >> other. */ >> if (maybe_lt (xsize, 0) || maybe_lt (ysize, 0)) >> return -1; >> /* If decls are different or we know by offsets that there is no >> overlap, >> we win. */ >> if (!cmp || !offset_overlap_p (c, xsize, ysize)) >> return 0; >> >> So we seem to be taking cmp==-1 to mean that although we don't know >> the relationship between the symbols, it must be the case that either >> (a) the symbols are equal (e.g. via aliasing) or (b) the accesses are >> to non-overlapping objects. In other words, one of the situations >> described by cmp==1 or cmp==0 must be true, but we don't know which >> at compile time. > Right. That was the conclusion I came to. If a SYMBOL_REF has an > alias, the alias must have the same value as the SYMBOL_REF. So their > either equal or there's no valid case for overlap. > >> >> This means that in practice, the answer to (Q1) appears to be "yes" >> but the answer to (Q2) appears to be "no". > That would be my understanding once aliases/interpositioning come into > play. > >> >> This somewhat contradicts: >> >> /* In general we assume that memory locations pointed to by different >> labels >> may overlap in undefined ways. */ >> return -1; >> >> at the end of compare_base_symbol_refs, which seems to be saying >> that the answer to (Q2) ought to be "yes" instead. Which is right? > I'm not sure how we could get to yes in that case. A symbol alias or > interposition ultimately still results in two symbols having the same > final address. Thus for a byte access if C1 != C2, then we can't have > an overlap. I think it's handling cases in which one symbol is a bare symbol (has no decl and isn't an anchor). I assumed the idea was that we could have a decl-less SYMBOL_REF for the start of a particular section, or things like that. >> In PR92294 we have a symbol X at ANCHOR+OFFSET that's preemptible. >> Under the (Q1)==yes/(Q2)==no assumption, cmp==-1 means that either >> (a) X = ANCHOR+OFFSET or (b) X and ANCHOR reference non-overlapping >> objects. So we should take the offset into account when doing: >> >> if (!cmp || !offset_overlap_p (c, xsize, ysize)) >> return 0; >> >> Let's call this FIX1. > So this is a really interesting wrinkle. Doesn't this change Q2 to a > yes? In particular it changes the "invariant" that the symbols have > the same address in the event of an symbol alias or interposition. Of > course one could ask the question of whether or not we should handle > cases with anchors specially. This wouldn't come under Q2, since that was about symbols that aren't in an anchor block. I think it just means we need to generalise the three cases that don't involve bare symbols from: - known equal - independent - equal or independent to: - known distance apart - independent - known distance apart or independent It's fortunate that anchors themselves can't be interposed. :-) >> But that then brings us to: why does memrefs_conflict_p return -1 >> when one symbol X has a decl and the other symbol Y doesn't, and neither >> of them are block symbols? Is the answer to (Q3) that we allow equality >> but not overlap here too? E.g. a linker script could define Y to X but >> not to a region that contains X at a nonzero offset? > Does digging into the history provide any insights here? Not that I could see. The code in question was part of a single patch. > I'm not sure given the issues you've introduced if I could actually > fill out the matrix of answers without more underlying information. > ie, when can we get symbols without source level decls, > anchors+interposition issues, etc. OK. In that case, I wonder whether it would be safer to have a fourth state on top of the three above: - known distance apart - independent - known distance apart or independent - don't know with "don't know" being anything that involves bare symbols? Richard