D10621: Highlighting Indexer: list of suggestions

2019-01-09 Thread jonathan poelen
jpoelen added a comment.


  I agree that some rules are excessive or even false, but the last time I 
watched (it was several months ago now), some regexes suggested to be 
DetectString seemed to have writing errors (mainly `\\` count as 2 characters 
in xml) and some more or less useful propositions. If there has been no change 
at this level, it should still be checked.
  
  Actually, I came to the conclusion that the parser himself could make some 
changes on the fly. For example, DetectChar/Detect2Chars are specializations of 
DetectString and the concatenation of regexes could be automatic, which would 
make most checks obsolete. There would be only suggestions to turn a Regex into 
something else, but it is much easier using an AST rather than a very slobbery 
regexes.
  
  Finally, this commit will never make a complete list without false positives, 
especially since code reviews filter out such errors. We can abandon it.
  
  PS: this commit uses `attrToBool` as for the parser (if it has not changed) 
and adds 2 missing `return false`. I do not have time to take care of it at the 
moment.

REPOSITORY
  R216 Syntax Highlighting

REVISION DETAIL
  https://phabricator.kde.org/D10621

To: jpoelen, dhaumann
Cc: kwrite-devel, kde-frameworks-devel, vkrause, 
#framework_syntax_highlighting, dhaumann, bmortimer, hase, michaelh, 
genethomas, ngraham, bruns, demsking, cullmann, sars


D10621: Highlighting Indexer: list of suggestions

2019-01-08 Thread Dominik Haumann
dhaumann added a comment.
Herald added a project: Kate.
Herald edited subscribers, added: kde-frameworks-devel, kwrite-devel; removed: 
Frameworks.


  What do we do with this?
  
  I think some of the checks are a bit too aggressive, such as folding multiple 
regexps into a single one: This makes the regexps much less readable and 
therefore hard to maintain.
  
  @jpoelen Shall we abandon this, or do you still would like to see some parts 
of this committed?

REPOSITORY
  R216 Syntax Highlighting

REVISION DETAIL
  https://phabricator.kde.org/D10621

To: jpoelen, dhaumann
Cc: kwrite-devel, kde-frameworks-devel, vkrause, 
#framework_syntax_highlighting, dhaumann, bmortimer, hase, michaelh, 
genethomas, ngraham, bruns, demsking, cullmann, sars, #frameworks


D10621: Highlighting Indexer: list of suggestions

2018-03-24 Thread jonathan poelen
jpoelen updated this revision to Diff 30448.
jpoelen added a comment.


  New suggestions:
  
  - Regex "^xyz\b" -> WordDetect with column=0
  - Regex "^xyz" with xyz a string of more than 2 characters -> StringDetext 
with column=0
  
  Output:
  
  - Double quote are no longer escaped
  - "^\\s*" is no longer deleted

REPOSITORY
  R216 Syntax Highlighting

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D10621?vs=29869&id=30448

BRANCH
  suggest

REVISION DETAIL
  https://phabricator.kde.org/D10621

AFFECTED FILES
  src/indexer/katehighlightingindexer.cpp

To: jpoelen, dhaumann
Cc: vkrause, #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, 
genethomas, ngraham, cullmann


D10621: Highlighting Indexer: list of suggestions

2018-03-18 Thread jonathan poelen
jpoelen updated this revision to Diff 29869.
jpoelen added a comment.


  More suggestions and fixes some false positives.

REPOSITORY
  R216 Syntax Highlighting

CHANGES SINCE LAST UPDATE
  https://phabricator.kde.org/D10621?vs=27439&id=29869

BRANCH
  suggest

REVISION DETAIL
  https://phabricator.kde.org/D10621

AFFECTED FILES
  src/indexer/katehighlightingindexer.cpp

To: jpoelen, dhaumann
Cc: vkrause, #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, 
genethomas, ngraham, cullmann


D10621: Highlighting Indexer: list of suggestions

2018-03-17 Thread jonathan poelen
jpoelen added a comment.


  I just tried with a format containing 5000 `` rules. This gives 2000 to 6000 more 
allocation than with WordDetect. Same for StringDetect vs Detect2Chars.
  
  At the speed level, for a single rule and a file with 13 * 8000 `aa `, 
WordDetect is 10% faster, but there is no difference between StringDetect and 
Detect2Chars. On the other hand, the more the number of rule increases the more 
the difference is important, even between StringDetect and Detect2Chars. 5000 
times the same rule is extremely slow.
  
  The memory test is done with `XDG_DATA_DIRS=$PWD memusage 
kate-syntax-highlighter x.aa >/dev/null`
  Speed one with `XDG_DATA_DIRS=$PWD /usr/bin/time --format="%Es - %MK" 
kate-syntax-highlighter x.aa >/dev/null`
  
  $PWD/org.kde.syntax-highlighting/syntax/aa.xml:
  





 
   
   
   
   
 


 
 




REPOSITORY
  R216 Syntax Highlighting

REVISION DETAIL
  https://phabricator.kde.org/D10621

To: jpoelen, dhaumann
Cc: vkrause, #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, 
genethomas, ngraham, cullmann


D10621: Highlighting Indexer: list of suggestions

2018-03-13 Thread Dominik Haumann
dhaumann added a comment.


  @jpoelen What would be interesting is to check which optimizations are really 
an improvement. Because we should either get a significant speed boost (e.g. 
RegExpr -> WordDetect), or at least reduce memory allocations (possibly 
StringDetect -> Detect2Chars and DetectChar).
  
  Did you do some testing here ?

REPOSITORY
  R216 Syntax Highlighting

REVISION DETAIL
  https://phabricator.kde.org/D10621

To: jpoelen, dhaumann
Cc: vkrause, #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, 
genethomas, cullmann


D10621: Highlighting Indexer: list of suggestions

2018-03-12 Thread jonathan poelen
jpoelen added a comment.


  I've already planned to fix all the suggestions, but I lost a lot of time 
updating the `sql*.xml` files and came across a very strange bug related to 
QRegularExpression by adding some tests. At the moment, I do not know if it's 
my tool to test, my version of Qt that is old or a real bug. Like last week my 
hard drive is dead, I took the opportunity to have a more recent configuration. 
I will make a few patches this weekend when I finished setting my new 
environment.
  
  The tool: https://github.com/jonathanpoelen/vt-kate-syntax-highlighter with 
the `-n` option.
  I also have a tool to generate a graph of the rules of an xml: 
https://github.com/jonathanpoelen/syntax-highlighting/tree/tools (graph.lua). 
The output is in a format readable by `graphviz`. The script is rudimentary and 
the xml parser does not work with some files (2 or 3).

REPOSITORY
  R216 Syntax Highlighting

REVISION DETAIL
  https://phabricator.kde.org/D10621

To: jpoelen, dhaumann
Cc: vkrause, #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, 
genethomas, cullmann


D10621: Highlighting Indexer: list of suggestions

2018-03-11 Thread Volker Krause
vkrause added a comment.


  Very nice idea! RegExp rules are the by far biggest cost factor, so every 
single one we can get rid of is good :)

REPOSITORY
  R216 Syntax Highlighting

REVISION DETAIL
  https://phabricator.kde.org/D10621

To: jpoelen, dhaumann
Cc: vkrause, #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, 
genethomas, cullmann


D10621: Highlighting Indexer: list of suggestions

2018-03-10 Thread Dominik Haumann
dhaumann added a comment.


  Just for info: 
https://kate-editor.org/2018/03/10/improving-syntax-highlighting-files/ tries 
to reach a broader audience for getting patches.

REPOSITORY
  R216 Syntax Highlighting

REVISION DETAIL
  https://phabricator.kde.org/D10621

To: jpoelen, dhaumann
Cc: #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, 
genethomas, cullmann, vkrause


D10621: Highlighting Indexer: list of suggestions

2018-03-10 Thread Dominik Haumann
dhaumann requested changes to this revision.
dhaumann added a comment.
This revision now requires changes to proceed.


  Btw, current generated list finds 1658 issues: https://paste.kde.org/p7iareuxc
  
  Not all of them are correct, for instance
  
zsh.xml" line 919 RegExpr candidate for  "Detect2Chars" : "%1"
  
  I believe %1 in this case is a backreference in a dynamic context, so a 
change to Detect2Chars would be wrong.
  
  Similarly, the suggestion
  
html-php.xml" line 223 RegExpr candidate for  "AnyChar" : "[^/><\"'\\s]"
  
  is also not correct, since [^xyz] means match everything _except_ xyz, so the 
^ symbol negates this, afaik (or am I remembering incorrectly?)...
  
  Still, there are many valid items which certainly can be optimized. So 
patches for fixes welcome! In small chunks, if possible!

REPOSITORY
  R216 Syntax Highlighting

REVISION DETAIL
  https://phabricator.kde.org/D10621

To: jpoelen, dhaumann
Cc: #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, 
genethomas, cullmann, vkrause


D10621: Highlighting Indexer: list of suggestions

2018-03-10 Thread Dominik Haumann
dhaumann added a comment.


  As quick comment: I like the idea of programmatically suggesting better usage 
of rules. I did not yet fine the time to look into this, so I do not know how 
much output this currently generates.
  
  In any case, for the highlighting files we ship with the syntax highlighting 
framework, I would like to have zero error/warning output generated by the 
highlighting indexer (just like we have zero compiler warnings). So if this 
generates a lot of suggestions, would you also agree to fix the output? This 
could possibly touch many many files and be a lot of work :-)

REPOSITORY
  R216 Syntax Highlighting

REVISION DETAIL
  https://phabricator.kde.org/D10621

To: jpoelen, dhaumann
Cc: #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, 
genethomas, cullmann, vkrause


D10621: Highlighting Indexer: list of suggestions

2018-02-17 Thread jonathan poelen
jpoelen added a project: Framework: Syntax Highlighting.
jpoelen added a subscriber: Framework: Syntax Highlighting.

REPOSITORY
  R216 Syntax Highlighting

REVISION DETAIL
  https://phabricator.kde.org/D10621

To: jpoelen, dhaumann
Cc: #framework_syntax_highlighting, dhaumann, #frameworks, michaelh, 
genethomas, cullmann, vkrause


D10621: Highlighting Indexer: list of suggestions

2018-02-17 Thread jonathan poelen
jpoelen created this revision.
jpoelen added a reviewer: dhaumann.
Restricted Application added a project: Frameworks.
Restricted Application added a subscriber: Frameworks.
jpoelen requested review of this revision.

REVISION SUMMARY
  Proposes mergers of rules and the replacement of:
  
  - RegExp by StringDetect, AnyChar, RangeDetect, etc.
  - StringDetect by DetectChar or Detect2Chars.

REPOSITORY
  R216 Syntax Highlighting

BRANCH
  suggest

REVISION DETAIL
  https://phabricator.kde.org/D10621

AFFECTED FILES
  src/indexer/katehighlightingindexer.cpp

To: jpoelen, dhaumann
Cc: dhaumann, #frameworks, michaelh