On Fri, Oct 20, 2017 at 1:34 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Fri, Oct 20, 2017 at 1:19 PM, Andreas Krebbel > <kreb...@linux.vnet.ibm.com> wrote: >> On 10/20/2017 10:28 AM, Richard Biener wrote: >>> On Fri, Oct 20, 2017 at 9:53 AM, Jakub Jelinek <ja...@redhat.com> wrote: >>>> On Fri, Oct 20, 2017 at 09:48:38AM +0200, Richard Biener wrote: >>>>> How does it work semantically to have different exec charsets? That is, >>>>> if "strings" flow from a region with one -fexec-charset setting to a >>>>> region >>>>> with another one is that undefined behavior? Do we now require >>>>> external function declarations to be in the proper region (declared under >>>>> the appropriate exec charset flag)? This would mean that passing >>>>> the exec charset in effect as additional argument isn't a possibility. >>>>> >>>>> Or do we have to treat -fexec-charset similar to -frounding-math, that is, >>>>> we can't ever _interpret_ any string in the compiler? [unless >>>>> -fexec-charset >>>>> is the same everywhere] >>>>> >>>>> I think the -frounding-math route is probably the easiest (and wisest >>>>> given the quite low test coverage we'll get) route. Thus, add a >>>>> -fmixed-charset >>>>> flag and reject any exec-charset attribute/pragma if that flag is not set? >>>>> With LTO we could always add this and/or merge -fexec-charset flags >>>>> appropriately, >>>>> injecting -fmixed-charset in case TUs use different settings. >>>> >>>> It wouldn't have to be an option, simply mark in cfun all functions that >>>> have more than one exec charset and give up on all optimizations/warnings >>>> that require to read the characters and merge that unknown exec_charset >>>> flag during inlining etc. Though, that might still not be enough, e.g. >>>> the whole function might have one exec charset, but a global const char [] >>>> variable might have another one and during optimization we might be looking >>>> at that. So perhaps it would need to be a per-TU flag merged during LTO. >>> >>> There's also IPA flow of strings between functions so unless mixing >>> exec charsets >>> invokes undefined behavior I can't see how a per-function flag would help. >>> >>> But yes, if we can reliably detect whether multiple exec charsets are >>> used in a TU >>> we can make this a flag that doesn't have to be set by the user. But that >>> means >>> the pragma probably _always_ forces that flag given we have that >>> forced pre-included >>> file on some targest and the pragma token would occur after that... >> >> Would it make sense to mark the string literals itself as not using the >> default charset? Then we >> could disable all interpretations only for these strings instead of >> disabling it for the entire TU? > > I think that would work, too. Though I'd then rather explicitely > state the charset the string literal is in > (for efficiency we'd then need some mapping of charset id to actual > charset we store globally somewhere > and which we'd need to stream and merge for LTO - the "default" would > then always get zero and > the default charset being streamed to LTO). Looks like > tree_base.u.bits is unused for STRING_CST > in the middle-end, you'd have to check FEs if they use a lang-specific > flag though. Then we could > stick the exec charset number there (32bit index even - whoo). Bah, > C++ of course uses a single > lang flag (PAREN_STRING_LITERAL_P). Sticking it in the literals type > would work as well but > I find that a bit ugly. We could reuse bits.address_space for a max > of 256 exec charsets, > a special value of 255 could indicate 'unknown, too many charsets' > also used in an initial implementation > without providing the actual mapping just distinguishing default from > non-default. > > The interesting part is of course libcpp/cc1 interaction and getting > this all right.
Oh, and there are plenty of bits unused for STRING_CST so if the C++ FE could stop using lang specific tree bits we could shrink tree_string by moving length to tree_base.u. Re-using address_space would block this improvement. Finding a single bit for default vs. non-default wouldn't. Richard. > Richard. > >> -Andreas- >>