D10621: Highlighting Indexer: list of suggestions
jpoelen added a comment. I agree that some rules are excessive or even false, but the last time I watched (it was several months ago now), some regexes suggested to be DetectString seemed to have writing errors (mainly `\\` count as 2 characters in xml) and some more or less useful propositions. If there has been no change at this level, it should still be checked. Actually, I came to the conclusion that the parser himself could make some changes on the fly. For example, DetectChar/Detect2Chars are specializations of DetectString and the concatenation of regexes could be automatic, which would make most checks obsolete. There would be only suggestions to turn a Regex into something else, but it is much easier using an AST rather than a very slobbery regexes. Finally, this commit will never make a complete list without false positives, especially since code reviews filter out such errors. We can abandon it. PS: this commit uses `attrToBool` as for the parser (if it has not changed) and adds 2 missing `return false`. I do not have time to take care of it at the moment. REPOSITORY R216 Syntax Highlighting REVISION DETAIL https://phabricator.kde.org/D10621 To: jpoelen, dhaumann Cc: kwrite-devel, kde-frameworks-devel, vkrause, #framework_syntax_highlighting, dhaumann, bmortimer, hase, michaelh, genethomas, ngraham, bruns, demsking, cullmann, sars
D10621: Highlighting Indexer: list of suggestions
dhaumann added a comment. Herald added a project: Kate. Herald edited subscribers, added: kde-frameworks-devel, kwrite-devel; removed: Frameworks. What do we do with this? I think some of the checks are a bit too aggressive, such as folding multiple regexps into a single one: This makes the regexps much less readable and therefore hard to maintain. @jpoelen Shall we abandon this, or do you still would like to see some parts of this committed? REPOSITORY R216 Syntax Highlighting REVISION DETAIL https://phabricator.kde.org/D10621 To: jpoelen, dhaumann Cc: kwrite-devel, kde-frameworks-devel, vkrause, #framework_syntax_highlighting, dhaumann, bmortimer, hase, michaelh, genethomas, ngraham, bruns, demsking, cullmann, sars, #frameworks
D10621: Highlighting Indexer: list of suggestions
jpoelen updated this revision to Diff 30448. jpoelen added a comment. New suggestions: - Regex "^xyz\b" -> WordDetect with column=0 - Regex "^xyz" with xyz a string of more than 2 characters -> StringDetext with column=0 Output: - Double quote are no longer escaped - "^\\s*" is no longer deleted REPOSITORY R216 Syntax Highlighting CHANGES SINCE LAST UPDATE https://phabricator.kde.org/D10621?vs=29869&id=30448 BRANCH suggest REVISION DETAIL https://phabricator.kde.org/D10621 AFFECTED FILES src/indexer/katehighlightingindexer.cpp To: jpoelen, dhaumann Cc: vkrause, #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, genethomas, ngraham, cullmann
D10621: Highlighting Indexer: list of suggestions
jpoelen updated this revision to Diff 29869. jpoelen added a comment. More suggestions and fixes some false positives. REPOSITORY R216 Syntax Highlighting CHANGES SINCE LAST UPDATE https://phabricator.kde.org/D10621?vs=27439&id=29869 BRANCH suggest REVISION DETAIL https://phabricator.kde.org/D10621 AFFECTED FILES src/indexer/katehighlightingindexer.cpp To: jpoelen, dhaumann Cc: vkrause, #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, genethomas, ngraham, cullmann
D10621: Highlighting Indexer: list of suggestions
jpoelen added a comment. I just tried with a format containing 5000 `` rules. This gives 2000 to 6000 more allocation than with WordDetect. Same for StringDetect vs Detect2Chars. At the speed level, for a single rule and a file with 13 * 8000 `aa `, WordDetect is 10% faster, but there is no difference between StringDetect and Detect2Chars. On the other hand, the more the number of rule increases the more the difference is important, even between StringDetect and Detect2Chars. 5000 times the same rule is extremely slow. The memory test is done with `XDG_DATA_DIRS=$PWD memusage kate-syntax-highlighter x.aa >/dev/null` Speed one with `XDG_DATA_DIRS=$PWD /usr/bin/time --format="%Es - %MK" kate-syntax-highlighter x.aa >/dev/null` $PWD/org.kde.syntax-highlighting/syntax/aa.xml: REPOSITORY R216 Syntax Highlighting REVISION DETAIL https://phabricator.kde.org/D10621 To: jpoelen, dhaumann Cc: vkrause, #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, genethomas, ngraham, cullmann
D10621: Highlighting Indexer: list of suggestions
dhaumann added a comment. @jpoelen What would be interesting is to check which optimizations are really an improvement. Because we should either get a significant speed boost (e.g. RegExpr -> WordDetect), or at least reduce memory allocations (possibly StringDetect -> Detect2Chars and DetectChar). Did you do some testing here ? REPOSITORY R216 Syntax Highlighting REVISION DETAIL https://phabricator.kde.org/D10621 To: jpoelen, dhaumann Cc: vkrause, #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, genethomas, cullmann
D10621: Highlighting Indexer: list of suggestions
jpoelen added a comment. I've already planned to fix all the suggestions, but I lost a lot of time updating the `sql*.xml` files and came across a very strange bug related to QRegularExpression by adding some tests. At the moment, I do not know if it's my tool to test, my version of Qt that is old or a real bug. Like last week my hard drive is dead, I took the opportunity to have a more recent configuration. I will make a few patches this weekend when I finished setting my new environment. The tool: https://github.com/jonathanpoelen/vt-kate-syntax-highlighter with the `-n` option. I also have a tool to generate a graph of the rules of an xml: https://github.com/jonathanpoelen/syntax-highlighting/tree/tools (graph.lua). The output is in a format readable by `graphviz`. The script is rudimentary and the xml parser does not work with some files (2 or 3). REPOSITORY R216 Syntax Highlighting REVISION DETAIL https://phabricator.kde.org/D10621 To: jpoelen, dhaumann Cc: vkrause, #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, genethomas, cullmann
D10621: Highlighting Indexer: list of suggestions
vkrause added a comment. Very nice idea! RegExp rules are the by far biggest cost factor, so every single one we can get rid of is good :) REPOSITORY R216 Syntax Highlighting REVISION DETAIL https://phabricator.kde.org/D10621 To: jpoelen, dhaumann Cc: vkrause, #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, genethomas, cullmann
D10621: Highlighting Indexer: list of suggestions
dhaumann added a comment. Just for info: https://kate-editor.org/2018/03/10/improving-syntax-highlighting-files/ tries to reach a broader audience for getting patches. REPOSITORY R216 Syntax Highlighting REVISION DETAIL https://phabricator.kde.org/D10621 To: jpoelen, dhaumann Cc: #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, genethomas, cullmann, vkrause
D10621: Highlighting Indexer: list of suggestions
dhaumann requested changes to this revision. dhaumann added a comment. This revision now requires changes to proceed. Btw, current generated list finds 1658 issues: https://paste.kde.org/p7iareuxc Not all of them are correct, for instance zsh.xml" line 919 RegExpr candidate for "Detect2Chars" : "%1" I believe %1 in this case is a backreference in a dynamic context, so a change to Detect2Chars would be wrong. Similarly, the suggestion html-php.xml" line 223 RegExpr candidate for "AnyChar" : "[^/><\"'\\s]" is also not correct, since [^xyz] means match everything _except_ xyz, so the ^ symbol negates this, afaik (or am I remembering incorrectly?)... Still, there are many valid items which certainly can be optimized. So patches for fixes welcome! In small chunks, if possible! REPOSITORY R216 Syntax Highlighting REVISION DETAIL https://phabricator.kde.org/D10621 To: jpoelen, dhaumann Cc: #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, genethomas, cullmann, vkrause
D10621: Highlighting Indexer: list of suggestions
dhaumann added a comment. As quick comment: I like the idea of programmatically suggesting better usage of rules. I did not yet fine the time to look into this, so I do not know how much output this currently generates. In any case, for the highlighting files we ship with the syntax highlighting framework, I would like to have zero error/warning output generated by the highlighting indexer (just like we have zero compiler warnings). So if this generates a lot of suggestions, would you also agree to fix the output? This could possibly touch many many files and be a lot of work :-) REPOSITORY R216 Syntax Highlighting REVISION DETAIL https://phabricator.kde.org/D10621 To: jpoelen, dhaumann Cc: #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, genethomas, cullmann, vkrause
D10621: Highlighting Indexer: list of suggestions
jpoelen added a project: Framework: Syntax Highlighting. jpoelen added a subscriber: Framework: Syntax Highlighting. REPOSITORY R216 Syntax Highlighting REVISION DETAIL https://phabricator.kde.org/D10621 To: jpoelen, dhaumann Cc: #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, genethomas, cullmann, vkrause
D10621: Highlighting Indexer: list of suggestions
jpoelen created this revision. jpoelen added a reviewer: dhaumann. Restricted Application added a project: Frameworks. Restricted Application added a subscriber: Frameworks. jpoelen requested review of this revision. REVISION SUMMARY Proposes mergers of rules and the replacement of: - RegExp by StringDetect, AnyChar, RangeDetect, etc. - StringDetect by DetectChar or Detect2Chars. REPOSITORY R216 Syntax Highlighting BRANCH suggest REVISION DETAIL https://phabricator.kde.org/D10621 AFFECTED FILES src/indexer/katehighlightingindexer.cpp To: jpoelen, dhaumann Cc: dhaumann, #frameworks, michaelh