Stamatis Zampetakis created CALCITE-6915:
--------------------------------------------

             Summary: Generalize terminology Linter to allow pattern based 
checks in commit messages
                 Key: CALCITE-6915
                 URL: https://issues.apache.org/jira/browse/CALCITE-6915
             Project: Calcite
          Issue Type: Improvement
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


CALCITE-6493 added some checks for enforcing certain terminology (mostly 
focused on DBMS systems) in commit messages. There are still though various 
terms that will not be captured by the existing checks. Consider, for instance 
the ["snowflake" 
term|https://github.com/apache/calcite/blob/bfbe8930f4ed7ba8da530e862e212a057191cfa3/core/src/test/java/org/apache/calcite/test/LintTest.java#L378]
 and the following messages:
 # Add support for Snowflake dialect
 # Add support for snowflake dialect
 # Add support for snowFlake dialect
 # Add support for SnowFlake dialect

Normally, only the first commit message should be valid since the accepted term 
is "Snowflake". The check flags correctly the case 2 as invalid but fails to 
capture the case 3 and 4.

The current implementation is based on an exact match word pattern that would 
require every single casing permutation of the word snowflake to be added in 
the 
[map|https://github.com/apache/calcite/blob/bfbe8930f4ed7ba8da530e862e212a057191cfa3/core/src/test/java/org/apache/calcite/test/LintTest.java#L71].
 This already happens to some extend for 
[MySQL|https://github.com/apache/calcite/blob/bfbe8930f4ed7ba8da530e862e212a057191cfa3/core/src/test/java/org/apache/calcite/test/LintTest.java#L367]
 term that appears twice in the map.

In some cases terminology rules may require more than just different casing 
rules. For this reason, I propose to generalize the terminology Linter to use a 
pattern based definition that allows to capture more than just one instance of 
a word and also extend the reference term to be a Set instead of a single entry.

Some secondary improvements from the proposed generalization are:
 * the use of pre-compiled patterns that are instantiated only once
 * the switch from Map to List as the container of the rules for faster 
iteration and better readability



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to