erichkeane added inline comments.
================
Comment at: clang/lib/Sema/SemaChecking.cpp:433
+
+ llvm::StringRef getFormatSpecifier(QualType T) {
+ if (auto *BT = T->getAs<BuiltinType>()) {
----------------
rsmith wrote:
> yihanaa wrote:
> > rsmith wrote:
> > > rsmith wrote:
> > > > aaron.ballman wrote:
> > > > > yihanaa wrote:
> > > > > > I think this is better maintained in "clang/AST/FormatString.h".
> > > > > > For example analyze_printf::PrintfSpecifier can get format
> > > > > > specifier, what do you all think about?
> > > > > +1 to this suggestion -- my hope is that we could generalize it more
> > > > > then as I notice there are missing specifiers for things like
> > > > > intmax_t, size_t, ptrdiff_t, _Decimal types, etc. Plus, that will
> > > > > hopefully make it easier for __builtin_dump_struct to benefit when
> > > > > new format specifiers are added, such as ones for printing a _BitInt.
> > > > I am somewhat uncertain: every one of these is making arbitrary choices
> > > > about how to format the value, so it's not clear to me that this is
> > > > general logic rather than something specific to
> > > > `__builtin_dump_struct`. For example, using `%f` rather than one of the
> > > > myriad other ways of formatting `double` is somewhat arbitrary. Using
> > > > `%s` for any `const char*` is *extremely* arbitrary and will be wrong
> > > > and will cause crashes in some cases, but it may be the pragmatically
> > > > correct choice for a dumping utility. A general-purpose mechanism would
> > > > use `%p` for all kinds of pointer.
> > > >
> > > > We could perhaps factor out the formatting for cases where there is a
> > > > clear canonical default formatting, such as for integer types and
> > > > probably `%p` for pointers, then call that from here with some
> > > > special-casing, but without a second consumer for that functionality
> > > > it's really not clear to me what form it should take.
> > > I went ahead and did this, mostly to match concurrent changes to the old
> > > implementation. There are a few cases where our existing "guess a format
> > > specifier" logic does the wrong thing for dumping purposes, which I've
> > > explicitly handled -- things like wanting to dump a `char` / `signed
> > > char` / `unsigned char` member as a number rather than as a (potentially
> > > non-printable or whitespace) character.
> > When I was patching that old implementation, I found that for uint8_t,
> > int8_t, Clang's existing "guess a format specifier" logic would treat the
> > value as an integer, but for unsigned char, signed char, char types, it
> > would Treat it as a character, please look at this example (
> > https://godbolt.org/z/ooqn4468T ), I guess this existing logic may have
> > made some special judgment.
> Yeah. I think in the case where we see some random call to `printf`, `%c` is
> probably the right thing to guess here, but it doesn't seem appropriate to me
> to use this in a dumping routine. If we could dump as `'x'` for printable
> characters and as `'\xAB'` for everything else, that'd be great, but `printf`
> can't do that itself and I'm not sure we want to be injecting calls to
> `isprint` or whatever to make that work. Dumping as an integer always seems
> like it's probably the least-bad option.
>
> Somewhat related: I wonder if we should use `"\"%s\""` instead of simply
> `"%s"` when dumping a `const char*`. That's not ideal but probably clearer
> than the current dump output.
I see value to having strings with SOME level of delimiter, if at least to
handle cases when the string itself has a newline in it.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D124221/new/
https://reviews.llvm.org/D124221
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits