[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon

2018-02-28 Thread Daniel Jasper via Phabricator via cfe-commits
djasper added a comment.

In https://reviews.llvm.org/D42729#1022069, @Typz wrote:

> In https://reviews.llvm.org/D42729#994841, @djasper wrote:
>
> > - Of course you find all sorts of errors while testing clang-format on a 
> > large-enough codebase. That doesn't mean that users run into them much.
> > - We have had about 10k clang-format users internally for several years. 
> > The semicolon issue comes up but really rarely and if it does, people 
> > happily fix their code not blaming clang-format.
> >
> >   Unrelated, my point remains that setting BlockKind in TokenAnnotator is 
> > bad enough that I wouldn't want to do it for reaping this small benefit. 
> > And I can't see how you could easily achieve the same thing without doing 
> > that.
>
>
> Just a question though. I there a reason brace matching (and other parts of 
> TokenAnnotations) are not performed before LineUnwrapping? That would 
> probably allow fixing this issue more cleanly (though I am not sure I would 
> have the time to actually perform this probably significant task)...


I think this is just the way it has grown. And brace/paren matching is actually 
done in both places several times by now, I think :(.

Fundamentally, the UnwrappedLineParser had the task of splitting a source file 
into lines and only implemented what it needed for that. The TokenAnnotator 
then did a much more elaborate analysis on each line to determine token types 
and such and distinguish what a "<" actually is (comparison vs. template 
opener). Having these two things be separate makes it slightly easier for error 
recovery and such as the state space they can be in is somewhat more limited.


Repository:
  rC Clang

https://reviews.llvm.org/D42729



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon

2018-02-28 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

In https://reviews.llvm.org/D42729#994841, @djasper wrote:

> - Of course you find all sorts of errors while testing clang-format on a 
> large-enough codebase. That doesn't mean that users run into them much.
> - We have had about 10k clang-format users internally for several years. The 
> semicolon issue comes up but really rarely and if it does, people happily fix 
> their code not blaming clang-format.
>
>   Unrelated, my point remains that setting BlockKind in TokenAnnotator is bad 
> enough that I wouldn't want to do it for reaping this small benefit. And I 
> can't see how you could easily achieve the same thing without doing that.


Just a question though. I there a reason brace matching (and other parts of 
TokenAnnotations) are not performed before LineUnwrapping? That would probably 
allow fixing this issue more cleanly (though I am not sure I would have the 
time to actually perform this probably significant task)...


Repository:
  rC Clang

https://reviews.llvm.org/D42729



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon

2018-02-01 Thread Daniel Jasper via Phabricator via cfe-commits
djasper added a comment.

- Of course you find all sorts of errors while testing clang-format on a 
large-enough codebase. That doesn't mean that users run into them much.
- We have had about 10k clang-format users internally for several years. The 
semicolon issue comes up but really rarely and if it does, people happily fix 
their code not blaming clang-format.

Unrelated, my point remains that setting BlockKind in TokenAnnotator is bad 
enough that I wouldn't want to do it for reaping this small benefit. And I 
can't see how you could easily achieve the same thing without doing that.


Repository:
  rC Clang

https://reviews.llvm.org/D42729



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon

2018-02-01 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

> I don't think cases where there is only a semicolon-less macro are 
> particularly frequent/important either. You probably could even add a 
> semicolon in those cases, right? How frequent are such cases in your 
> codebase? Either way, they aren't fixed by this patch, so they aren't a good 
> reason to move forward with it.

Adding a semicolon works for indentation and behavior, but leads to compiler 
warning.
So a proper fix would require to change the macro, which can easily be tricky 
with macros containings branches or loops...

It should not happen that often, but in reality it actually happens often 
enough that I found the bug while simply testing clang-format...

> I still believe that this patch adds complexity for very little gain. And I 
> am not even sure it is correct. 
> isFunctionDeclarationName/getFunctionParameterList is just yet another 
> heuristic that might go wrong. And it might go wrong in both ways. You might 
> still miss some cases and you might start incorrectly formatting stuff as 
> functions. Fundamentally clang-format just doesn't have enough info to do 
> this correctly.

As such, it does not add such complexity I think, but indeed it is not 
completely correct: it works only in the case where no line breaks are expected 
around the body (since it is still parsed as a single UnwrappedLine). So it is 
a slight improvement, but not a complete fix.

`isFunctionDeclarationName()` is an heuristic as well, but provides better 
result and does not break the existing one (since there are actually quite a 
few tests, and none get broken by this patch :-) )

I think "overall" clang-format has enough information for this case, it is just 
not available at the right time: Unwrapping is done first, then token and 
matching parenthesis are analysed and lines anotated, and these token 
anotations are needed to improve the unwrapping behavior...
I would really want to call `isFunctionDeclarationName()` from 
`UnwrappedLineParser::calculateBraceTypes()`, but parenthesis matching, nesting 
level and operators identification are not available yet.

> ".. which can easily be overlooked". If they are overlooked, nobody cares 
> about the space either, so no harm done ;).

We want to use a formatter to ensure the code is consistent: e.g. to ensure 
things that may be overlooked by a human are actually formatted according to 
the rules.
We want people to trust the tool to do the right thing, so it is best to avoid 
as many errors as possible...


Repository:
  rC Clang

https://reviews.llvm.org/D42729



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon

2018-01-31 Thread Daniel Jasper via Phabricator via cfe-commits
djasper added a comment.

I don't think cases where there is only a semicolon-less macro are particularly 
frequent/important either. You probably could even add a semicolon in those 
cases, right? How frequent are such cases in your codebase? Either way, they 
aren't fixed by this patch, so they aren't a good reason to move forward with 
it.

I still believe that this patch adds complexity for very little gain. And I am 
not even sure it is correct. isFunctionDeclarationName/getFunctionParameterList 
is just yet another heuristic that might go wrong. And it might go wrong in 
both ways. You might still miss some cases and you might start incorrectly 
formatting stuff as functions. Fundamentally clang-format just doesn't have 
enough info to do this correctly.

In https://reviews.llvm.org/D42729#993188, @Typz wrote:

> In https://reviews.llvm.org/D42729#993159, @djasper wrote:
>
> > I think this case is not important enough to fix. Please tell users to just 
> > delete the useless semicolon.
>
>
> I would agree if were simple to spot: but often this may manifest itself only 
> with a missing space between the function parameters and the function body, 
> which can easily be overlooked...


".. which can easily be overlooked". If they are overlooked, nobody cares about 
the space either, so no harm done ;).

> Another option may be to create new pass which "removes" that extra 
> semicolon: this way we would both fix it and get things right on next pass.

I am not sure we can do this reliably enough.

> However, the issue with a function which contains only a macro and which is 
> followed by another function which returns an custom type cannot so easily be 
> fixed:
> 
>   void abort() {
> FOO()
>   }
>   uint32_t bar() {}
> 
> 
> (note that this case works fine if the body of the function contains a 
> semicolon or reserved keyword, or if the next function returns a base type 
> [int, void...])


Repository:
  rC Clang

https://reviews.llvm.org/D42729



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon

2018-01-31 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

In https://reviews.llvm.org/D42729#993159, @djasper wrote:

> I think this case is not important enough to fix. Please tell users to just 
> delete the useless semicolon.


I would agree if were simple to spot: but often this may manifest itself only 
with a missing space between the function parameters and the function body, 
which can easily be overlooked...
Another option may be to create new pass which "removes" that extra semicolon: 
this way we would both fix it and get things right on next pass.

However, the issue with a function which contains only a macro and which is 
followed by another function which returns an custom type cannot so easily be 
fixed:

  void abort() {
FOO()
  }
  uint32_t bar() {}

(note that this case works fine if the body of the function contains a 
semicolon or reserved keyword, or if the next function returns a base type 
[int, void...])


Repository:
  rC Clang

https://reviews.llvm.org/D42729



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon

2018-01-31 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added a comment.

There are actually other cases, e.g. with macros "containing" the semicolon:

  void abort() {
FOO()
BAR()
  };

gets reformatted to this (still wrong with this patch, but the space after the 
parenthesis is added):

  void abort(){ FOO() BAR() };

And also this one (though it may be slightly different, there is no semicolon 
here):

  void abort() {
FOO()
  }
  uint32_t bar() {}

gets reformatted to:

  void abort(){ FOO() BAR() } uint32_t bar() {}

IMO the "real" fix would be to somehow let 
`TokenAnnotator::calculateFormattingInformation` split the current 
UnwrappedLine/AnnotatedLine, so that the 'end of the line' could be unwrapped 
properly: this would also allow keeping the invariant, and just implement some 
kind of backtracking. But it may be tricky to get the state right; maybe adding 
some child lines instead of just adding the lines may be simpler?


Repository:
  rC Clang

https://reviews.llvm.org/D42729



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon

2018-01-31 Thread Daniel Jasper via Phabricator via cfe-commits
djasper added a comment.

I think this case is not important enough to fix. Please tell users to just 
delete the useless semicolon.

In particular, my concern is that this makes the data-flow significantly more 
complicated. Early on in the development, we had separate token classes for the 
different states of the analysis (unwrapped line parsing, annotation, 
formatting) with const members. We had to move away from that as it didn't 
carry the costs that it had. However, this change changes the behavior from the 
BlockKind only ever being set inside the UnwrappedLineParser and being consumed 
later (e.g. in the TokenAnnotator) to a state where it is set in many places 
and setting and changing the value are inter-mingled. That makes it much harder 
to understand the invariants. E.g. how can we ensure that the incorrect 
BK_BracedInit hasn't already be consumed by something (now and in the future).


Repository:
  rC Clang

https://reviews.llvm.org/D42729



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon

2018-01-31 Thread Francois Ferrand via Phabricator via cfe-commits
Typz added inline comments.



Comment at: lib/Format/TokenAnnotator.cpp:1911
+  if (Next && Next->is(tok::l_brace) && Next->BlockKind == BK_BracedInit)
+  Next->BlockKind = BK_Block;
+}

this may actually not be enough in all cases: to completely match the 
'standard' behavior we should be able to "split" the UnwrappedLine at both 
opening and ending brace, in this case.


Repository:
  rC Clang

https://reviews.llvm.org/D42729



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D42729: clang-format: Fix formatting of function body followed by semicolon

2018-01-31 Thread Francois Ferrand via Phabricator via cfe-commits
Typz updated this revision to Diff 132130.
Typz added a comment.

update commit message


Repository:
  rC Clang

https://reviews.llvm.org/D42729

Files:
  lib/Format/TokenAnnotator.cpp
  unittests/Format/FormatTest.cpp

Index: unittests/Format/FormatTest.cpp
===
--- unittests/Format/FormatTest.cpp
+++ unittests/Format/FormatTest.cpp
@@ -11386,6 +11386,15 @@
"};");
 }
 
+TEST_F(FormatTest, MunchSemicolonAfterFunction) {
+  verifyFormat("void foo() { int i; };");
+  verifyFormat("void foo() {\n"
+   "  int i;\n"
+   "  int j;\n"
+   "};");
+  verifyFormat("void foo() {};");
+}
+
 TEST_F(FormatTest, ConfigurableContinuationIndentWidth) {
   FormatStyle TwoIndent = getLLVMStyleWithColumns(15);
   TwoIndent.ContinuationIndentWidth = 2;
Index: lib/Format/TokenAnnotator.cpp
===
--- lib/Format/TokenAnnotator.cpp
+++ lib/Format/TokenAnnotator.cpp
@@ -1783,9 +1783,12 @@
 }
 
 // This function heuristically determines whether 'Current' starts the name of a
-// function declaration.
-static bool isFunctionDeclarationName(const FormatToken &Current,
-  const AnnotatedLine &Line) {
+// function declaration, and returns the first token of parameters list in that
+// case.
+// @return The token which opens the parameters list (e.g. the opening
+// parenthesis), or nullptr if this is not a function declaration.
+static const FormatToken *getFunctionParameterList(const FormatToken &Current,
+   const AnnotatedLine &Line) {
   auto skipOperatorName = [](const FormatToken *Next) -> const FormatToken * {
 for (; Next; Next = Next->Next) {
   if (Next->is(TT_OverloadedOperatorLParen))
@@ -1809,58 +1812,58 @@
   const FormatToken *Next = Current.Next;
   if (Current.is(tok::kw_operator)) {
 if (Current.Previous && Current.Previous->is(tok::coloncolon))
-  return false;
+  return nullptr;
 Next = skipOperatorName(Next);
   } else {
 if (!Current.is(TT_StartOfName) || Current.NestingLevel != 0)
-  return false;
+  return nullptr;
 for (; Next; Next = Next->Next) {
   if (Next->is(TT_TemplateOpener)) {
 Next = Next->MatchingParen;
   } else if (Next->is(tok::coloncolon)) {
 Next = Next->Next;
 if (!Next)
-  return false;
+  return nullptr;
 if (Next->is(tok::kw_operator)) {
   Next = skipOperatorName(Next->Next);
   break;
 }
 if (!Next->is(tok::identifier))
-  return false;
+  return nullptr;
   } else if (Next->is(tok::l_paren)) {
 break;
   } else {
-return false;
+return nullptr;
   }
 }
   }
 
   // Check whether parameter list can belong to a function declaration.
   if (!Next || !Next->is(tok::l_paren) || !Next->MatchingParen)
-return false;
+return nullptr;
   // If the lines ends with "{", this is likely an function definition.
   if (Line.Last->is(tok::l_brace))
-return true;
+return Next;
   if (Next->Next == Next->MatchingParen)
-return true; // Empty parentheses.
+return Next; // Empty parentheses.
   // If there is an &/&& after the r_paren, this is likely a function.
   if (Next->MatchingParen->Next &&
   Next->MatchingParen->Next->is(TT_PointerOrReference))
-return true;
+return Next;
   for (const FormatToken *Tok = Next->Next; Tok && Tok != Next->MatchingParen;
Tok = Tok->Next) {
 if (Tok->is(tok::l_paren) && Tok->MatchingParen) {
   Tok = Tok->MatchingParen;
   continue;
 }
 if (Tok->is(tok::kw_const) || Tok->isSimpleTypeSpecifier() ||
 Tok->isOneOf(TT_PointerOrReference, TT_StartOfName, tok::ellipsis))
-  return true;
+  return Next;
 if (Tok->isOneOf(tok::l_brace, tok::string_literal, TT_ObjCMethodExpr) ||
 Tok->Tok.isLiteral())
-  return false;
+  return nullptr;
   }
-  return false;
+  return nullptr;
 }
 
 bool TokenAnnotator::mustBreakForReturnType(const AnnotatedLine &Line) const {
@@ -1899,8 +1902,14 @@
   FormatToken *Current = Line.First->Next;
   bool InFunctionDecl = Line.MightBeFunctionDecl;
   while (Current) {
-if (isFunctionDeclarationName(*Current, Line))
+if (const FormatToken *Tok = getFunctionParameterList(*Current, Line)) {
   Current->Type = TT_FunctionDeclarationName;
+
+  // Fix block kind, if it was mistakenly identified as a BracedInit
+  FormatToken *Next = Tok->MatchingParen->Next;
+  if (Next && Next->is(tok::l_brace) && Next->BlockKind == BK_BracedInit)
+  Next->BlockKind = BK_Block;
+}
 if (Current->is(TT_LineComment)) {
   if (Current->Previous->BlockKind == BK_BracedInit &&
   Current->Previous->opensScope())
___
cfe-commits mailing list
cfe-commi