[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon
djasper added a comment. In https://reviews.llvm.org/D42729#1022069, @Typz wrote: > In https://reviews.llvm.org/D42729#994841, @djasper wrote: > > > - Of course you find all sorts of errors while testing clang-format on a > > large-enough codebase. That doesn't mean that users run into them much. > > - We have had about 10k clang-format users internally for several years. > > The semicolon issue comes up but really rarely and if it does, people > > happily fix their code not blaming clang-format. > > > > Unrelated, my point remains that setting BlockKind in TokenAnnotator is > > bad enough that I wouldn't want to do it for reaping this small benefit. > > And I can't see how you could easily achieve the same thing without doing > > that. > > > Just a question though. I there a reason brace matching (and other parts of > TokenAnnotations) are not performed before LineUnwrapping? That would > probably allow fixing this issue more cleanly (though I am not sure I would > have the time to actually perform this probably significant task)... I think this is just the way it has grown. And brace/paren matching is actually done in both places several times by now, I think :(. Fundamentally, the UnwrappedLineParser had the task of splitting a source file into lines and only implemented what it needed for that. The TokenAnnotator then did a much more elaborate analysis on each line to determine token types and such and distinguish what a "<" actually is (comparison vs. template opener). Having these two things be separate makes it slightly easier for error recovery and such as the state space they can be in is somewhat more limited. Repository: rC Clang https://reviews.llvm.org/D42729 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon
Typz added a comment. In https://reviews.llvm.org/D42729#994841, @djasper wrote: > - Of course you find all sorts of errors while testing clang-format on a > large-enough codebase. That doesn't mean that users run into them much. > - We have had about 10k clang-format users internally for several years. The > semicolon issue comes up but really rarely and if it does, people happily fix > their code not blaming clang-format. > > Unrelated, my point remains that setting BlockKind in TokenAnnotator is bad > enough that I wouldn't want to do it for reaping this small benefit. And I > can't see how you could easily achieve the same thing without doing that. Just a question though. I there a reason brace matching (and other parts of TokenAnnotations) are not performed before LineUnwrapping? That would probably allow fixing this issue more cleanly (though I am not sure I would have the time to actually perform this probably significant task)... Repository: rC Clang https://reviews.llvm.org/D42729 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon
djasper added a comment. - Of course you find all sorts of errors while testing clang-format on a large-enough codebase. That doesn't mean that users run into them much. - We have had about 10k clang-format users internally for several years. The semicolon issue comes up but really rarely and if it does, people happily fix their code not blaming clang-format. Unrelated, my point remains that setting BlockKind in TokenAnnotator is bad enough that I wouldn't want to do it for reaping this small benefit. And I can't see how you could easily achieve the same thing without doing that. Repository: rC Clang https://reviews.llvm.org/D42729 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon
Typz added a comment. > I don't think cases where there is only a semicolon-less macro are > particularly frequent/important either. You probably could even add a > semicolon in those cases, right? How frequent are such cases in your > codebase? Either way, they aren't fixed by this patch, so they aren't a good > reason to move forward with it. Adding a semicolon works for indentation and behavior, but leads to compiler warning. So a proper fix would require to change the macro, which can easily be tricky with macros containings branches or loops... It should not happen that often, but in reality it actually happens often enough that I found the bug while simply testing clang-format... > I still believe that this patch adds complexity for very little gain. And I > am not even sure it is correct. > isFunctionDeclarationName/getFunctionParameterList is just yet another > heuristic that might go wrong. And it might go wrong in both ways. You might > still miss some cases and you might start incorrectly formatting stuff as > functions. Fundamentally clang-format just doesn't have enough info to do > this correctly. As such, it does not add such complexity I think, but indeed it is not completely correct: it works only in the case where no line breaks are expected around the body (since it is still parsed as a single UnwrappedLine). So it is a slight improvement, but not a complete fix. `isFunctionDeclarationName()` is an heuristic as well, but provides better result and does not break the existing one (since there are actually quite a few tests, and none get broken by this patch :-) ) I think "overall" clang-format has enough information for this case, it is just not available at the right time: Unwrapping is done first, then token and matching parenthesis are analysed and lines anotated, and these token anotations are needed to improve the unwrapping behavior... I would really want to call `isFunctionDeclarationName()` from `UnwrappedLineParser::calculateBraceTypes()`, but parenthesis matching, nesting level and operators identification are not available yet. > ".. which can easily be overlooked". If they are overlooked, nobody cares > about the space either, so no harm done ;). We want to use a formatter to ensure the code is consistent: e.g. to ensure things that may be overlooked by a human are actually formatted according to the rules. We want people to trust the tool to do the right thing, so it is best to avoid as many errors as possible... Repository: rC Clang https://reviews.llvm.org/D42729 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon
djasper added a comment. I don't think cases where there is only a semicolon-less macro are particularly frequent/important either. You probably could even add a semicolon in those cases, right? How frequent are such cases in your codebase? Either way, they aren't fixed by this patch, so they aren't a good reason to move forward with it. I still believe that this patch adds complexity for very little gain. And I am not even sure it is correct. isFunctionDeclarationName/getFunctionParameterList is just yet another heuristic that might go wrong. And it might go wrong in both ways. You might still miss some cases and you might start incorrectly formatting stuff as functions. Fundamentally clang-format just doesn't have enough info to do this correctly. In https://reviews.llvm.org/D42729#993188, @Typz wrote: > In https://reviews.llvm.org/D42729#993159, @djasper wrote: > > > I think this case is not important enough to fix. Please tell users to just > > delete the useless semicolon. > > > I would agree if were simple to spot: but often this may manifest itself only > with a missing space between the function parameters and the function body, > which can easily be overlooked... ".. which can easily be overlooked". If they are overlooked, nobody cares about the space either, so no harm done ;). > Another option may be to create new pass which "removes" that extra > semicolon: this way we would both fix it and get things right on next pass. I am not sure we can do this reliably enough. > However, the issue with a function which contains only a macro and which is > followed by another function which returns an custom type cannot so easily be > fixed: > > void abort() { > FOO() > } > uint32_t bar() {} > > > (note that this case works fine if the body of the function contains a > semicolon or reserved keyword, or if the next function returns a base type > [int, void...]) Repository: rC Clang https://reviews.llvm.org/D42729 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon
Typz added a comment. In https://reviews.llvm.org/D42729#993159, @djasper wrote: > I think this case is not important enough to fix. Please tell users to just > delete the useless semicolon. I would agree if were simple to spot: but often this may manifest itself only with a missing space between the function parameters and the function body, which can easily be overlooked... Another option may be to create new pass which "removes" that extra semicolon: this way we would both fix it and get things right on next pass. However, the issue with a function which contains only a macro and which is followed by another function which returns an custom type cannot so easily be fixed: void abort() { FOO() } uint32_t bar() {} (note that this case works fine if the body of the function contains a semicolon or reserved keyword, or if the next function returns a base type [int, void...]) Repository: rC Clang https://reviews.llvm.org/D42729 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon
Typz added a comment. There are actually other cases, e.g. with macros "containing" the semicolon: void abort() { FOO() BAR() }; gets reformatted to this (still wrong with this patch, but the space after the parenthesis is added): void abort(){ FOO() BAR() }; And also this one (though it may be slightly different, there is no semicolon here): void abort() { FOO() } uint32_t bar() {} gets reformatted to: void abort(){ FOO() BAR() } uint32_t bar() {} IMO the "real" fix would be to somehow let `TokenAnnotator::calculateFormattingInformation` split the current UnwrappedLine/AnnotatedLine, so that the 'end of the line' could be unwrapped properly: this would also allow keeping the invariant, and just implement some kind of backtracking. But it may be tricky to get the state right; maybe adding some child lines instead of just adding the lines may be simpler? Repository: rC Clang https://reviews.llvm.org/D42729 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon
djasper added a comment. I think this case is not important enough to fix. Please tell users to just delete the useless semicolon. In particular, my concern is that this makes the data-flow significantly more complicated. Early on in the development, we had separate token classes for the different states of the analysis (unwrapped line parsing, annotation, formatting) with const members. We had to move away from that as it didn't carry the costs that it had. However, this change changes the behavior from the BlockKind only ever being set inside the UnwrappedLineParser and being consumed later (e.g. in the TokenAnnotator) to a state where it is set in many places and setting and changing the value are inter-mingled. That makes it much harder to understand the invariants. E.g. how can we ensure that the incorrect BK_BracedInit hasn't already be consumed by something (now and in the future). Repository: rC Clang https://reviews.llvm.org/D42729 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon
Typz added inline comments. Comment at: lib/Format/TokenAnnotator.cpp:1911 + if (Next && Next->is(tok::l_brace) && Next->BlockKind == BK_BracedInit) + Next->BlockKind = BK_Block; +} this may actually not be enough in all cases: to completely match the 'standard' behavior we should be able to "split" the UnwrappedLine at both opening and ending brace, in this case. Repository: rC Clang https://reviews.llvm.org/D42729 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon
Typz updated this revision to Diff 132130. Typz added a comment. update commit message Repository: rC Clang https://reviews.llvm.org/D42729 Files: lib/Format/TokenAnnotator.cpp unittests/Format/FormatTest.cpp Index: unittests/Format/FormatTest.cpp === --- unittests/Format/FormatTest.cpp +++ unittests/Format/FormatTest.cpp @@ -11386,6 +11386,15 @@ "};"); } +TEST_F(FormatTest, MunchSemicolonAfterFunction) { + verifyFormat("void foo() { int i; };"); + verifyFormat("void foo() {\n" + " int i;\n" + " int j;\n" + "};"); + verifyFormat("void foo() {};"); +} + TEST_F(FormatTest, ConfigurableContinuationIndentWidth) { FormatStyle TwoIndent = getLLVMStyleWithColumns(15); TwoIndent.ContinuationIndentWidth = 2; Index: lib/Format/TokenAnnotator.cpp === --- lib/Format/TokenAnnotator.cpp +++ lib/Format/TokenAnnotator.cpp @@ -1783,9 +1783,12 @@ } // This function heuristically determines whether 'Current' starts the name of a -// function declaration. -static bool isFunctionDeclarationName(const FormatToken &Current, - const AnnotatedLine &Line) { +// function declaration, and returns the first token of parameters list in that +// case. +// @return The token which opens the parameters list (e.g. the opening +// parenthesis), or nullptr if this is not a function declaration. +static const FormatToken *getFunctionParameterList(const FormatToken &Current, + const AnnotatedLine &Line) { auto skipOperatorName = [](const FormatToken *Next) -> const FormatToken * { for (; Next; Next = Next->Next) { if (Next->is(TT_OverloadedOperatorLParen)) @@ -1809,58 +1812,58 @@ const FormatToken *Next = Current.Next; if (Current.is(tok::kw_operator)) { if (Current.Previous && Current.Previous->is(tok::coloncolon)) - return false; + return nullptr; Next = skipOperatorName(Next); } else { if (!Current.is(TT_StartOfName) || Current.NestingLevel != 0) - return false; + return nullptr; for (; Next; Next = Next->Next) { if (Next->is(TT_TemplateOpener)) { Next = Next->MatchingParen; } else if (Next->is(tok::coloncolon)) { Next = Next->Next; if (!Next) - return false; + return nullptr; if (Next->is(tok::kw_operator)) { Next = skipOperatorName(Next->Next); break; } if (!Next->is(tok::identifier)) - return false; + return nullptr; } else if (Next->is(tok::l_paren)) { break; } else { -return false; +return nullptr; } } } // Check whether parameter list can belong to a function declaration. if (!Next || !Next->is(tok::l_paren) || !Next->MatchingParen) -return false; +return nullptr; // If the lines ends with "{", this is likely an function definition. if (Line.Last->is(tok::l_brace)) -return true; +return Next; if (Next->Next == Next->MatchingParen) -return true; // Empty parentheses. +return Next; // Empty parentheses. // If there is an &/&& after the r_paren, this is likely a function. if (Next->MatchingParen->Next && Next->MatchingParen->Next->is(TT_PointerOrReference)) -return true; +return Next; for (const FormatToken *Tok = Next->Next; Tok && Tok != Next->MatchingParen; Tok = Tok->Next) { if (Tok->is(tok::l_paren) && Tok->MatchingParen) { Tok = Tok->MatchingParen; continue; } if (Tok->is(tok::kw_const) || Tok->isSimpleTypeSpecifier() || Tok->isOneOf(TT_PointerOrReference, TT_StartOfName, tok::ellipsis)) - return true; + return Next; if (Tok->isOneOf(tok::l_brace, tok::string_literal, TT_ObjCMethodExpr) || Tok->Tok.isLiteral()) - return false; + return nullptr; } - return false; + return nullptr; } bool TokenAnnotator::mustBreakForReturnType(const AnnotatedLine &Line) const { @@ -1899,8 +1902,14 @@ FormatToken *Current = Line.First->Next; bool InFunctionDecl = Line.MightBeFunctionDecl; while (Current) { -if (isFunctionDeclarationName(*Current, Line)) +if (const FormatToken *Tok = getFunctionParameterList(*Current, Line)) { Current->Type = TT_FunctionDeclarationName; + + // Fix block kind, if it was mistakenly identified as a BracedInit + FormatToken *Next = Tok->MatchingParen->Next; + if (Next && Next->is(tok::l_brace) && Next->BlockKind == BK_BracedInit) + Next->BlockKind = BK_Block; +} if (Current->is(TT_LineComment)) { if (Current->Previous->BlockKind == BK_BracedInit && Current->Previous->opensScope()) ___ cfe-commits mailing list cfe-commi