Git commit b520021b17714aeec3b4a7fbffa6273c76655c6b by Dominik Haumann, on behalf of Nibaldo González. Committed on 19/10/2019 at 20:19. Pushed by scmsync into branch 'master'.
Update documentation of highlight & RegExp M +145 -2 doc/katepart/development.docbook M +38 -11 doc/katepart/regular-expressions.docbook https://commits.kde.org/kate/b520021b17714aeec3b4a7fbffa6273c76655c6b diff --git a/doc/katepart/development.docbook b/doc/katepart/development.docbook index 93e5046e9..632adfd7e 100644 --- a/doc/katepart/development.docbook +++ b/doc/katepart/development.docbook @@ -339,7 +339,8 @@ In this example, the <userinput>itemData</userinput> <emphasis>Normal Text</emph <varlistentry> <term>The last part of a highlight definition is the optional <userinput>general</userinput> section. It may contain information -about keywords, code folding, comments and indentation.</term> +about keywords, code folding, comments, indentation, empty lines and +spell checking.</term> <listitem> <para>The <userinput>comment</userinput> section defines with what @@ -350,12 +351,24 @@ user presses the corresponding shortcut for <emphasis>comment/uncomment</emphasi <para>The <userinput>keywords</userinput> section defines whether keyword lists are case sensitive or not. Other attributes will be explained later.</para> +<para>The other sections, <userinput>folding</userinput>, +<userinput>emptyLines</userinput> and <userinput>spellchecking</userinput>, +are usually not necessary and are explained later.</para> <programlisting> <general> <comments> <comment name="singleLine" start="#"/> </comments> <keywords casesensitive="1"/> + <folding indentationsensitive="0"/> + <emptyLines> + <emptyLine regexpr="\s+"/> + <emptyLine regexpr="\s*#.*"/> + </emptyLines> + <spellchecking> + <encoding char="á" string="\'a"/> + <encoding char="à" string="\`a"/> + </spellchecking> </general> </language> </programlisting> @@ -397,6 +410,10 @@ to the context specified in fallthroughContext if no rule matches. Default: <emphasis>false</emphasis>.</para> <para><userinput>fallthroughContext</userinput> specifies the next context if no rule matches.</para> +<para><userinput>noIndentationBasedFolding</userinput> disables indentation-based folding +in the context. If indentation-based folding is not activated, this attribute is useless. +This is defined in the element <emphasis>folding</emphasis> of the group <emphasis>general</emphasis>. +Default: <emphasis>false</emphasis>.</para> </listitem> </varlistentry> @@ -490,6 +507,35 @@ do not need to set it, as it defaults to <emphasis>false</emphasis>.</para> </varlistentry> +<varlistentry> +<term>The element <userinput>emptyLine</userinput> in the group <userinput>emptyLines</userinput> +defines which lines should be treated as empty lines. This allows modifying the behavior of the +<emphasis>lineEmptyContext</emphasis> attribute in the elements <userinput>context</userinput>. +Available attributes are:</term> + +<listitem> +<para><userinput>regexpr</userinput> defines a regular expression that will be treated as an empty line. +By default, empty lines do not contain any characters, therefore, this adds additional empty lines, +for example, if you want lines with spaces to also be considered empty lines. +However, in most syntax definitions you do not need to set this attribute.</para> +</listitem> +</varlistentry> + + +<varlistentry> +<term>The element <userinput>encoding</userinput> in the group <userinput>spellchecking</userinput> +defines a character encoding for spell checking. Available attributes:</term> + +<listitem> +<para><userinput>char</userinput> is a encoded character.</para> +<para><userinput>string</userinput> is a sequence of characters that will be encoded as +the character <emphasis>char</emphasis> in the spell checking. +For example, in the language LaTeX, the string <userinput>\"{A}</userinput> represents +the character <userinput>Ä</userinput>.</para> +</listitem> +</varlistentry> + + </variablelist> @@ -654,7 +700,7 @@ current context in its <userinput>string</userinput> or <userinput>char</userinput> attributes. In a <userinput>string</userinput>, the placeholder <replaceable>%N</replaceable> (where N is a number) will be replaced with the corresponding capture <replaceable>N</replaceable> -from the calling regular expression. In a +from the calling regular expression, starting from 1. In a <userinput>char</userinput> the placeholder must be a number <replaceable>N</replaceable> and it will be replaced with the first character of the corresponding capture <replaceable>N</replaceable> from the calling regular @@ -666,6 +712,93 @@ expression. Whenever a rule allows this attribute it will contain a </listitem> </itemizedlist> +<para>How does it work:</para> + +<para>In the <link linkend="regular-expressions">regular expressions</link> of the +<userinput>RegExpr</userinput> rules, all text within simple curved brackets +<userinput>(PATTERN)</userinput> is captured and remembered. +These captures can be used in the context to which it is switched, in the rules with the +attribute <userinput>dynamic</userinput> <emphasis>true</emphasis>, by +<replaceable>%N</replaceable> (in <emphasis>String</emphasis>) or +<replaceable>N</replaceable> (in <emphasis>char</emphasis>).</para> + +<para>It is important to mention that a text captured in a <userinput>RegExpr</userinput> rule is +only stored for the switched context, specified in its <userinput>context</userinput> attribute.</para> + +<tip> +<itemizedlist> + +<listitem> +<para>If the captures will not be used, both by dynamic rules and in the same regular expression, +<userinput>non-capturing groups</userinput> should be used: <userinput>(?:PATTERN)</userinput></para> +<para>The <emphasis>lookahead</emphasis> or <emphasis>lookbehind</emphasis> groups such as +<userinput>(?=PATTERN)</userinput> or <userinput>(?!PATTERN)</userinput> are not captured. +See <link linkend="regular-expressions">Regular Expressions</link> for more information.</para> +</listitem> + +<listitem> +<para>The capture groups can be used within the same regular expression, +using <replaceable>\N</replaceable> instead of <replaceable>%N</replaceable> respectively. +For more information, see <link linkend="regex-capturing">Capturing matching text (back references)</link> +in <link linkend="regular-expressions">Regular Expressions</link>.</para> +</listitem> + +</itemizedlist> +</tip> + +<para>Example 1:</para> +<para>In this simple example, the text matched by the regular expression +<userinput>=*</userinput> is captured and inserted into <replaceable>%1</replaceable> +in the dynamic rule. This allows the comment to end with the same amount of +<userinput>=</userinput> as at the beginning. This matches text like: +<userinput>[[ comment ]]</userinput>, <userinput>[=[ comment ]=]</userinput> or +<userinput>[=====[ comment ]=====]</userinput>.</para> +<para>In addition, the captures are available only in the switched context +<emphasis>Multi-line Comment</emphasis>.</para> + +<programlisting> +<context name="Normal" attribute="Normal Text" lineEndContext="#stay"> + <RegExpr context="Multi-line Comment" attribute="Comment" String="\[(=*)\[" beginRegion="RegionComment"/> +</context> +<context name="Multi-line Comment" attribute="Comment" lineEndContext="#stay"> + <StringDetect context="#pop" attribute="Comment" String="]%1]" dynamic="true" endRegion="RegionComment"/> +</context> +</programlisting> + +<para>Example 2:</para> +<para>In the dynamic rule, <replaceable>%1</replaceable> corresponds to the capture that matches +<userinput>#+</userinput>, and <replaceable>%2</replaceable> to <userinput>&quot;+</userinput>. +This matches text as: <userinput>#label""""inside the context""""#</userinput>.</para> +<para>These captures will not be available in other contexts, such as +<emphasis>OtherContext</emphasis>, <emphasis>FindEscapes</emphasis> or +<emphasis>SomeContext</emphasis>.</para> + +<programlisting> +<context name="SomeContext" attribute="Normal Text" lineEndContext="#stay"> + <RegExpr context="#pop!NamedString" attribute="String" String="(#+)(?:[\w-]|[^[:ascii:]])(&quot;+)"/> +</context> +<context name="NamedString" attribute="String" lineEndContext="#stay"> + <RegExpr context="#pop!OtherContext" attribute="String" String="%2(?:%1)?" dynamic="true"/> + <DetectChar context="FindEscapes" attribute="Escape" char="\"/> +</context> +</programlisting> + +<para>Example 3:</para> +<para>This matches text like: +<userinput>Class::function<T>( ... )</userinput>.</para> + +<programlisting> +<context name="Normal" attribute="Normal Text" lineEndContext="#stay"> + <RegExpr context="FunctionName" String="\b([a-zA-Z_][\w-]*)(::)([a-zA-Z_][\w-]*)(?:&lt;[\w\-\s]*&gt;)?(\()" lookAhead="true"/> +</context> +<context name="FunctionName" attribute="Normal Text" lineEndContext="#pop"> + <StringDetect context="#stay" attribute="Class" String="%1" dynamic="true"/> + <StringDetect context="#stay" attribute="Operator" String="%2" dynamic="true"/> + <StringDetect context="#stay" attribute="Function" String="%3" dynamic="true"/> + <DetectChar context="#pop" attribute="Normal Text" char="4" dynamic="true"/> +</context> +</programlisting> + <sect3 id="highlighting-rules-in-detail"> <title>The Rules in Detail</title> @@ -955,6 +1088,16 @@ The attribute <userinput>column</userinput> counts characters, so a tabulator is </para> </listitem> <listitem> +<para>In <userinput>RegExpr</userinput> rules, use the attribute <userinput>column="0"</userinput> if the pattern +<userinput>^PATTERN</userinput> will be used to match text at the beginning of a line. +This improves performance, as it will avoid looking for matches in the rest of the columns.</para> +</listitem> +<listitem> +<para>In regular expressions, use non-capturing groups <userinput>(?:PATTERN)</userinput> instead of +capturing groups <userinput>(PATTERN)</userinput>, if the captures will not be used in the same regular +expression or in dynamic rules. This avoids storing captures unnecessarily.</para> +</listitem> +<listitem> <para>You can switch contexts without processing characters. Assume that you want to switch context when you meet the string <userinput>*/</userinput>, but need to process that string in the next context. The below rule will match, and diff --git a/doc/katepart/regular-expressions.docbook b/doc/katepart/regular-expressions.docbook index 9c97fac7f..cdf00c5eb 100644 --- a/doc/katepart/regular-expressions.docbook +++ b/doc/katepart/regular-expressions.docbook @@ -240,15 +240,14 @@ corresponding to the octal number ooo (between 0 and <varlistentry> <term><userinput>\w</userinput></term> -<listitem><para>Matches any <quote>word character</quote> - in this case any letter or digit. Note that -underscore (<literal>_</literal>) is not matched, as is the case with perl regular expressions. -Equal to <literal>[a-zA-Z0-9]</literal></para></listitem> +<listitem><para>Matches any <quote>word character</quote> - in this case any letter, digit or underscore. +Equal to <literal>[a-zA-Z0-9_]</literal></para></listitem> </varlistentry> <varlistentry> <term><userinput>\W</userinput></term> -<listitem><para>Matches any non-word character - anything but letters or numbers. -Equal to <literal>[^a-zA-Z0-9]</literal> or <literal>[^\w]</literal></para></listitem> +<listitem><para>Matches any non-word character - anything but letters, numbers or underscore. +Equal to <literal>[^a-zA-Z0-9_]</literal> or <literal>[^\w]</literal></para></listitem> </varlistentry> @@ -256,13 +255,17 @@ Equal to <literal>[^a-zA-Z0-9]</literal> or <literal>[^\w]</literal></para></lis </para> +<para>The <emphasis>POSIX notation of classes</emphasis>, +<userinput>[:<class name>:]</userinput> are also supported. +For example, <userinput>[:digit:]</userinput> is equivalent to <userinput>\d</userinput>, +and <userinput>[:space:]</userinput> to <userinput>\s</userinput>. +See the full list of POSIX character classes +<ulink url="https://www.regular-expressions.info/posixbrackets.html">here</ulink>.</para> + <para>The abbreviated classes can be put inside a custom class, for example to match a word character, a blank or a dot, you could write <userinput>[\w \.]</userinput></para> -<note> <para>The POSIX notation of classes, <userinput>[:<class -name>:]</userinput> is currently not supported.</para> </note> - <sect3> <title>Characters with special meanings inside character classes</title> @@ -331,12 +334,14 @@ put the alternatives inside a subpattern: </sect3> -<sect3> +<sect3 id="regex-capturing"> <title>Capturing matching text (back references)</title> -<para>If you want to use a back reference, use a sub pattern to have -the desired part of the pattern remembered.</para> +<para>If you want to use a back reference, use a sub pattern <userinput>(PATTERN)</userinput> +to have the desired part of the pattern remembered. +To prevent the sub pattern from being remembered, use a non-capturing group +<userinput>(?:PATTERN)</userinput>.</para> <para>For example, if you want to find two occurrences of the same word separated by a comma and possibly some whitespace, you could @@ -657,6 +662,28 @@ pattern.</para> </listitem> </varlistentry> +<varlistentry> +<term><userinput>(PATTERN)</userinput> (Capturing group)</term> + +<listitem><para>The sub pattern within the parentheses is captured and remembered, +so that it can be used in back references. For example, the expression +<userinput>(&quot;+)[^&quot;]*\1</userinput> matches +<userinput>""""text""""</userinput> and +<userinput>"text"</userinput>.</para> +<para>See the section <link linkend="regex-capturing">Capturing matching text (back references)</link> +for more information.</para> +</listitem> +</varlistentry> + +<varlistentry> +<term><userinput>(?:PATTERN)</userinput> (Non-capturing group)</term> + +<listitem><para>The sub pattern within the parentheses is not captured and +is not remembered. It is preferable to always use non-capturing groups if +the captures will not be used.</para> +</listitem> +</varlistentry> + </variablelist> </para>