This is an automated email from the ASF dual-hosted git repository. mariofusco pushed a commit to branch dev-new-parser in repository https://gitbox.apache.org/repos/asf/incubator-kie-drools.git
commit 03b35bc3b850bd30f5084252a50fb23f2ae3cd4d Author: Toshiya Kobayashi <[email protected]> AuthorDate: Fri Nov 17 17:16:34 2023 +0900 [DROOLS-7577] drools-lsp : Add comments and explanations for drools-p… (#43) * [DROOLS-7577] drools-lsp : Add comments and explanations for drools-parser * removed syntax examples --- drools-drl/drools-drl10-parser/Developer_Notes.md | 38 ++++++++++ .../java/org/drools/parser/DRLErrorListener.java | 3 + .../java/org/drools/parser/DRLParserError.java | 3 + .../java/org/drools/parser/DRLParserHelper.java | 13 ++++ .../java/org/drools/parser/DRLParserWrapper.java | 6 ++ .../java/org/drools/parser/DRLVisitorImpl.java | 86 ++++++++++++++++------ .../java/org/drools/parser/ParserStringUtils.java | 18 ++++- 7 files changed, 142 insertions(+), 25 deletions(-) diff --git a/drools-drl/drools-drl10-parser/Developer_Notes.md b/drools-drl/drools-drl10-parser/Developer_Notes.md new file mode 100644 index 0000000000..c521ee6d14 --- /dev/null +++ b/drools-drl/drools-drl10-parser/Developer_Notes.md @@ -0,0 +1,38 @@ +## drools-parser + +This module is a reimplementation of the DRL (Drools Rule Language) parser based on ANTLR4. + +The current [DRL6Parser](https://github.com/apache/incubator-kie-drools/blob/main/drools-drl/drools-drl-parser/src/main/java/org/drools/drl/parser/lang/DRL6Parser.java) is based on ANTLR3 and contains a lot of custom modifications, which is hard to maintain. This new module should keep the separation between the parser syntax (`DRLParser.g4`) and the Descr generation (`DRLVisitorImpl.java`). + +This module started with a part of LSP to develop DRL editors, but it is not limited to that. This module will also replace DRL6Parser in the drools code base. + +### How is this developed? + +1. The starting point is [DRL6Parser](https://github.com/apache/incubator-kie-drools/blob/main/drools-drl/drools-drl-parser/src/main/java/org/drools/drl/parser/lang/DRL6Parser.java). While it contains lots of customizations, we can map its javadoc (e.g. `packageStatement := PACKAGE qualifiedIdentifier SEMICOLON?`) to `DRLParser.g4` (e.g. `packagedef : PACKAGE name=drlQualifiedName SEMI? ;`). +2. `DRLLexer.g4` is written to define tokens for DRL. +3. `DRLLexer.g4` imports `JavaLexer.g4` to reuse Java tokens. `DRLParser.g4` imports `JavaParser.g4` to reuse Java grammar. These Java parser files are distributed by ANTLR4 under BSD license. +4. In `DRLLexer.g4`, basically define tokens with a prefix "DRL_" to clarify they are DRL keywords. +5. In `DRLParser.g4`, define parser rules with a prefix "drl" if the rule name conflicts with `JavaParser.g4`. Sometimes we need to do that, because such a rule may contain DRL keywords. +6. (As of 2023/10/31) this parser doesn't deeply parse rule RHS (just multiple `RHS_CHUNK`s), because Drools passes RHS text to drools-compiler as-is. In case of developing DRL editors, we may need to integrate another Java LSP to support RHS code completion, etc. +7. LHS constraint (e.g. `age > 30`) is also handled as text. Further processing will be done in the later compiler phase. +8. `DRLParser` processes a DRL text and produces an AST(abstract syntax tree). Then apply `DRLVisitorImpl` to generate PackageDescr following the visitor pattern. So the main work would be implementing `DRLParser.g4` and `DRLVisitorImpl`. +9. Errors are handled by `DRLErrorListener` +10. (As of 2023/10/31) We have 2 test classes. `DRLParserTest` is a very basic test to check if the parser can parse DRL. `MiscDRLParserTest` contains various DRL syntax to check if the parser generates correct Descr objects. `MiscDRLParserTest` was ported from [RuleParserTest](https://github.com/apache/incubator-kie-drools/blob/main/drools-test-coverage/test-compiler-integration/src/test/java/org/drools/mvel/compiler/lang/RuleParserTest.java) so that we can ensure the compatibility of g [...] +11. As `DRL6Parser` contains hard-coded customizations, sometimes we need to read and understand the `DRL6Parser` source codes to meet the compatibility. +12. (As of 2023/10/31) `MiscDRLParserTest` still has several test cases with `@Disabled` which are relatively lower priority or edge cases. They need to be resolved at some point in the future. To fix the issues, file a JIRA, remove the `@Disabled` annotation, and fix the implementation to pass the test case. + +### Next steps + +1. Create a feature branch in drools repo and replace `DRL6Parser` with this new parser. +2. We will detect issues in the new parser by running the existing tests in drools repo. If we find any issues, we will fix them in the new parser and add new tests to cover them. Such tests would be more or less Descr comparison tests, so we would add a new test class which is similar to `MiscDRLParserTest`. + +### Refactoring candidates +- `DRLParserHelper` and `DRLParserWrapper` have some duplicated code and purpose. We can merge them into one class. +- `MiscDRLParserTest` can be cleaner and fixed to align with SonarLint suggestions. +- Constraint related parser rules after `conditionalOrExpression` are written in antlr3 style. They could be refactored to antlr4 style (like `lhsExpression`). + +### Development tips +- IntelliJ IDEA has an ANTLR4 plugin, which "ANTLR Preview" window displays a parse tree. It is very useful to debug the parser rules. + +### Resources +[The Definitive ANTLR 4 Reference](https://pragprog.com/titles/tpantlr2/the-definitive-antlr-4-reference/) \ No newline at end of file diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLErrorListener.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLErrorListener.java index a15d12addd..f7c6504df4 100644 --- a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLErrorListener.java +++ b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLErrorListener.java @@ -7,6 +7,9 @@ import org.antlr.v4.runtime.BaseErrorListener; import org.antlr.v4.runtime.RecognitionException; import org.antlr.v4.runtime.Recognizer; +/** + * Collect errors while parsing DRL + */ public class DRLErrorListener extends BaseErrorListener { private final List<DRLParserError> errors = new ArrayList<>(); diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserError.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserError.java index 6b9fe11694..ec3e41747d 100644 --- a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserError.java +++ b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserError.java @@ -1,5 +1,8 @@ package org.drools.parser; +/** + * Error information while parsing DRL + */ public class DRLParserError { private int lineNumber; diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserHelper.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserHelper.java index 3bd15b3f4f..bee7a6e96d 100644 --- a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserHelper.java +++ b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserHelper.java @@ -10,11 +10,18 @@ import org.antlr.v4.runtime.tree.ParseTree; import org.antlr.v4.runtime.tree.TerminalNode; import org.drools.drl.ast.descr.PackageDescr; +/** + * Collection of static helper methods for DRLParser + */ public class DRLParserHelper { private DRLParserHelper() { } + /** + * Entry point for parsing DRL. + * Unlike DRLParserWrapper.parse(), this method does not collect errors. + */ public static PackageDescr parse(String drl) { DRLParser drlParser = createDrlParser(drl); return compilationUnitContext2PackageDescr(drlParser.compilationUnit(), drlParser.getTokenStream()); @@ -27,6 +34,9 @@ public class DRLParserHelper { return new DRLParser(commonTokenStream); } + /** + * DRLVisitorImpl visits a parse tree and creates a PackageDescr + */ public static PackageDescr compilationUnitContext2PackageDescr(DRLParser.CompilationUnitContext ctx, TokenStream tokenStream) { DRLVisitorImpl visitor = new DRLVisitorImpl(tokenStream); Object descr = visitor.visit(ctx); @@ -37,6 +47,9 @@ public class DRLParserHelper { } } + /** + * Given a row and column of the input DRL, return the index of the matched token + */ public static Integer computeTokenIndex(DRLParser parser, int row, int col) { for (int i = 0; i < parser.getInputStream().size(); i++) { Token token = parser.getInputStream().get(i); diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserWrapper.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserWrapper.java index 06992f0440..8f24bf1d3f 100644 --- a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserWrapper.java +++ b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserWrapper.java @@ -10,12 +10,18 @@ import org.slf4j.LoggerFactory; import static org.drools.parser.DRLParserHelper.compilationUnitContext2PackageDescr; +/** + * Wrapper for DRLParser. Somewhat duplicated from DRLParserHelper, but this class is instantiated and holds errors. + */ public class DRLParserWrapper { private static final Logger LOGGER = LoggerFactory.getLogger(DRLParserWrapper.class); private final List<DRLParserError> errors = new ArrayList<>(); + /** + * Main entry point for parsing DRL + */ public PackageDescr parse(String drl) { DRLParser drlParser = DRLParserHelper.createDrlParser(drl); DRLErrorListener errorListener = new DRLErrorListener(); diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLVisitorImpl.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLVisitorImpl.java index 8212ac32be..3f920cc13a 100644 --- a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLVisitorImpl.java +++ b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLVisitorImpl.java @@ -48,6 +48,12 @@ import static org.drools.parser.ParserStringUtils.safeStripStringDelimiters; import static org.drools.parser.ParserStringUtils.trimThen; import static org.drools.util.StringUtils.unescapeJava; +/** + * Visitor implementation for DRLParser. + * Basically, each visit method creates and returns a Descr object traversing the parse tree. + * Finally, visitCompilationUnit() returns a PackageDescr object. + * Try not to depend on DRLVisitorImpl's internal state for clean maintainability + */ public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> { private final TokenStream tokenStream; @@ -56,6 +62,9 @@ public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> { this.tokenStream = tokenStream; } + /** + * Main entry point for creating PackageDescr from a parser tree. + */ @Override public PackageDescr visitCompilationUnit(DRLParser.CompilationUnitContext ctx) { PackageDescr packageDescr = new PackageDescr(); @@ -67,7 +76,11 @@ public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> { return packageDescr; } + /** + * Add all children Descr to PackageDescr + */ private void applyChildrenDescrs(PackageDescr packageDescr, List<BaseDescr> descrList) { + // This bunch of if-blocks will be refactored by DROOLS-7564 descrList.forEach(descr -> { if (descr instanceof UnitDescr) { packageDescr.setUnit((UnitDescr) descr); @@ -147,6 +160,8 @@ public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> { functionDescr.setReturnType("void"); } functionDescr.setName(ctx.IDENTIFIER().getText()); + + // add function parameters DRLParser.FormalParametersContext formalParametersContext = ctx.formalParameters(); DRLParser.FormalParameterListContext formalParameterListContext = formalParametersContext.formalParameterList(); if (formalParameterListContext != null) { @@ -163,7 +178,7 @@ public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> { @Override public BaseDescr visitDeclaredef(DRLParser.DeclaredefContext ctx) { - return visitDescrChildren(ctx).get(0); + return visitDescrChildren(ctx).get(0); // only one child } @Override @@ -202,6 +217,9 @@ public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> { return windowDeclarationDescr; } + /** + * entry point for one rule + */ @Override public RuleDescr visitRuledef(DRLParser.RuledefContext ctx) { RuleDescr ruleDescr = new RuleDescr(safeStripStringDelimiters(ctx.name.getText())); @@ -228,16 +246,27 @@ public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> { if (ctx.rhs() != null) { ruleDescr.setConsequenceLocation(ctx.rhs().getStart().getLine(), ctx.rhs().getStart().getCharPositionInLine()); // location of "then" - ruleDescr.setConsequence(trimThen(getTextPreservingWhitespace(ctx.rhs()))); + ruleDescr.setConsequence(trimThen(getTextPreservingWhitespace(ctx.rhs()))); // RHS is just a text } return ruleDescr; } private void slimLhsRootDescr(AndDescr root) { + // Root Descr is always AndDescr. + // For example, if there are nested AndDescr like + // AndDescr + // /\ + // P AndDescr + // /\ + // P P + // is slimmed down to + // AndDescr + // / | \ + // P P P List<BaseDescr> descrList = new ArrayList<>(root.getDescrs()); root.getDescrs().clear(); - descrList.forEach(root::addOrMerge); // This slims down nested AndDescr + descrList.forEach(root::addOrMerge); } @Override @@ -373,6 +402,9 @@ public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> { return attributeDescr; } + /** + * entry point for LHS + */ @Override public List<BaseDescr> visitLhs(DRLParser.LhsContext ctx) { if (ctx.lhsExpression() != null) { @@ -394,10 +426,12 @@ public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> { } private PatternDescr getSinglePatternDescr(DRLParser.LhsPatternBindContext ctx) { - Optional<BaseDescr> optPatternDescr = visitFirstDescrChild(ctx); - PatternDescr patternDescr = optPatternDescr.filter(PatternDescr.class::isInstance) - .map(PatternDescr.class::cast) - .orElseThrow(() -> new IllegalStateException("lhsPatternBind must have at least one lhsPattern : " + ctx.getText())); + List<BaseDescr> patternDescrList = visitDescrChildren(ctx); + if (patternDescrList.isEmpty() || !(patternDescrList.get(0) instanceof PatternDescr)) { + throw new IllegalStateException("lhsPatternBind must have at least one lhsPattern : " + ctx.getText()); + } + PatternDescr patternDescr = (PatternDescr)patternDescrList.get(0); + if (ctx.label() != null) { patternDescr.setIdentifier(ctx.label().IDENTIFIER().getText()); } else if (ctx.unif() != null) { @@ -423,6 +457,9 @@ public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> { return orDescr; } + /** + * entry point for a Pattern + */ @Override public PatternDescr visitLhsPattern(DRLParser.LhsPatternContext ctx) { PatternDescr patternDescr = new PatternDescr(ctx.objectType.getText()); @@ -528,6 +565,9 @@ public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> { return new EntryPointDescr(safeStripStringDelimiters(ctx.stringId().getText())); } + /** + * Collect constraints in a Pattern + */ @Override public List<ExprConstraintDescr> visitConstraints(DRLParser.ConstraintsContext ctx) { List<ExprConstraintDescr> exprConstraintDescrList = new ArrayList<>(); @@ -535,6 +575,9 @@ public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> { return exprConstraintDescrList; } + /** + * Collect constraints in a Pattern. Positional constraints comes first with semicolon. + */ private List<ExprConstraintDescr> visitConstraints(DRLParser.PositionalConstraintsContext positionalCtx, DRLParser.ConstraintsContext ctx) { List<ExprConstraintDescr> exprConstraintDescrList = new ArrayList<>(); populateExprConstraintDescrList(positionalCtx, exprConstraintDescrList); @@ -557,6 +600,9 @@ public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> { } } + /** + * Takes one constraint as String and create ExprConstraintDescr + */ @Override public ExprConstraintDescr visitConstraint(DRLParser.ConstraintContext ctx) { String constraint = visitConstraintChildren(ctx); @@ -618,6 +664,7 @@ public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> { @Override public BaseDescr visitLhsExpressionEnclosed(DRLParser.LhsExpressionEnclosedContext ctx) { + // enclosed expression is simply stripped because Descr itself is encapsulated return (BaseDescr) visit(ctx.lhsExpression()); } @@ -708,7 +755,7 @@ public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> { @Override public BaseDescr visitLhsUnary(DRLParser.LhsUnaryContext ctx) { - return visitDescrChildren(ctx).get(0); + return visitDescrChildren(ctx).get(0); // lhsUnary has only one child } private void populateStartEnd(BaseDescr descr, ParserRuleContext ctx) { @@ -718,6 +765,10 @@ public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> { descr.setEndCharacter(ctx.getStop().getStopIndex()); } + /** + * This is a special version of visitChildren(). + * This collects children BaseDescr objects and returns them as a list. + */ private List<BaseDescr> visitDescrChildren(RuleNode node) { List<BaseDescr> aggregator = new ArrayList<>(); int n = node.getChildCount(); @@ -768,21 +819,10 @@ public class DRLVisitorImpl extends DRLParserBaseVisitor<Object> { } } - // leaves of constraint concatenate return Strings + /** + * Return the text of constraint as-is + */ private String visitConstraintChildren(ParserRuleContext ctx) { return getTokenTextPreservingWhitespace(ctx, tokenStream); } - - private Optional<BaseDescr> visitFirstDescrChild(RuleNode node) { - int n = node.getChildCount(); - - for (int i = 0; i < n && this.shouldVisitNextChild(node, null); ++i) { - ParseTree c = node.getChild(i); - Object childResult = c.accept(this); - if (childResult instanceof BaseDescr) { - return Optional.of((BaseDescr) childResult); - } - } - return Optional.empty(); - } -} +} \ No newline at end of file diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/ParserStringUtils.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/ParserStringUtils.java index 1f84eb6f9c..a294bf2a92 100644 --- a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/ParserStringUtils.java +++ b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/ParserStringUtils.java @@ -5,13 +5,17 @@ import org.antlr.v4.runtime.TokenStream; import org.antlr.v4.runtime.misc.Interval; /** - * will be merged in drools-util + * Collection of String utilities used by DRLParser. + * This may be merged in drools-util */ public class ParserStringUtils { private ParserStringUtils() { } + /** + * Strip string delimiters (e.g. "foo" -> foo) + */ public static String safeStripStringDelimiters(String value) { if (value != null) { value = value.trim(); @@ -22,6 +26,9 @@ public class ParserStringUtils { return value; } + /** + * Get text from ParserRuleContext's CharStream without trimming whitespace + */ public static String getTextPreservingWhitespace(ParserRuleContext ctx) { // Using raw CharStream int startIndex = ctx.start.getStartIndex(); @@ -34,11 +41,18 @@ public class ParserStringUtils { return ctx.start.getTokenSource().getInputStream().getText(interval); } + /** + * Get text from ParserRuleContext's CharStream without trimming whitespace + * tokenStream is required to get hidden channel token (e.g. whitespace). + * Unlike getTextPreservingWhitespace, this method reflects Lexer normalizeString + */ public static String getTokenTextPreservingWhitespace(ParserRuleContext ctx, TokenStream tokenStream) { - // tokenStream is required to get hidden channel token (e.g. whitespace). Unlike getTextPreservingWhitespace, this method reflects Lexer normalizeString return tokenStream.getText(ctx.start, ctx.stop); } + /** + * Just remove leading "then" + */ public static String trimThen(String rhs) { if (rhs.startsWith("then")) { return rhs.substring("then".length()); --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
