GitHub user ijokarumawak opened a pull request: https://github.com/apache/nifi/pull/3200
NIFI-5826 WIP Fix back-slash escaping at Lexers ## Summary Current Lexers convert a back-slash plus another character sequence (e.g. '\[') to double back-slash plus the next character (e.g. '\\['). But from detailed analysis (see below), it seems the conversion is wrong and it should leave such characters as it is. ## Details I debugged how Lexer works, and found that: - The `ESC` fragment handles an escaped special character in String representation. I.e. String `\t` will be converted to actual tab character. - The string values user input from NiFi UI are passed to `RecordPath.compile` method as it is. E.g. the input string `replaceRegex(/name, '\[', '')` is passed to as is, then the single back-slash is converted to double back-slash by the ESC fragment line 155. - I believe the line 153-156 is aimed to preserve escaped characters as it is, because such escape doesn't mean anything for the RecordPath/AttrExpLang spec. And those should be unescaped later by underlying syntaxes such as RegEx. - And current line 155 does it wrongly. It should append a single back-slash.. - Other Lexers (AttributeExpressionLexer.g and HL7QueryLexer.g) have the same issue. - So, I think we should fix all Lexers instead of adding another conversion. Here is the [Lexer code](https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record-path/src/main/antlr3/org/apache/nifi/record/path/RecordPathLexer.g#L143) for reference: ``` 143 fragment 144 ESC 145 : '\\' 146 ( 147 '"' { setText("\""); } 148 | '\'' { setText("\'"); } 149 | 'r' { setText("\r"); } 150 | 'n' { setText("\n"); } 151 | 't' { setText("\t"); } 152 | '\\' { setText("\\\\"); } 153 | nextChar = ~('"' | '\'' | 'r' | 'n' | 't' | '\\') 154 { 155 StringBuilder lBuf = new StringBuilder(); lBuf.append("\\\\").appendCodePoint(nextChar); setText(lBuf.toString()); 156 } 157 ) 158 ; ``` ## NiFi template for test Here is a NiFi flow template to test how before/after this change. https://gist.github.com/ijokarumawak/b6bdca8074a4457bc4a425b90a6b17f0 In order to try the template, you need to build this PR as NiFi 1.9.0-SNAPSHOT, then download following 1.8.0 nars in your SNAPSHOT's lib dir, so that both versions can be used in the flow. - https://repo.maven.apache.org/maven2/org/apache/nifi/nifi-standard-nar/1.8.0/nifi-standard-nar-1.8.0.nar - https://repo.maven.apache.org/maven2/org/apache/nifi/nifi-update-attribute-nar/1.8.0/nifi-update-attribute-nar-1.8.0.nar ## Test result ### UpdateAttribute test for backward compatibility ![image](https://user-images.githubusercontent.com/1107620/49493740-93078700-f8a0-11e8-9360-025254b39551.png) GenerateFlowFile generates FlowFiles with attribute `a` whose value is: ``` this is new line and this is just a backslash \n ``` ![image](https://user-images.githubusercontent.com/1107620/49493751-9c90ef00-f8a0-11e8-8d41-6005f01157a7.png) Result 1.8.0 ![image](https://user-images.githubusercontent.com/1107620/49493779-b5010980-f8a0-11e8-9911-22c0d71e865b.png) 1.9.0-SNAPSHOT ![image](https://user-images.githubusercontent.com/1107620/49493786-baf6ea80-f8a0-11e8-8c04-7efb54167345.png) ### UpdateRecord test illustrating the NIFI-5826 issue is addressed ![image](https://user-images.githubusercontent.com/1107620/49493825-e083f400-f8a0-11e8-8b4a-cf17e370282e.png) GenerateFlowFile generates content: ``` key,value on[e,1 [two,2 ``` ![image](https://user-images.githubusercontent.com/1107620/49493836-e8439880-f8a0-11e8-8e89-f07db2712690.png) Result 1.8.0 Regex compilation error as reported ![image](https://user-images.githubusercontent.com/1107620/49493844-f09bd380-f8a0-11e8-822f-f747f3184fc5.png) 1.9.0-SNAPSHOT The square brackets are converted successfully ![image](https://user-images.githubusercontent.com/1107620/49493863-03160d00-f8a1-11e8-8f11-ac95670412ed.png) --- Thank you for submitting a contribution to Apache NiFi. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [ ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [ ] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? - [ ] Is your initial contribution a single, squashed commit? ### For code changes: - [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder? - [ ] Have you written or updated unit tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly? - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly? - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ijokarumawak/nifi NIFI-5826 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/3200.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3200 ---- commit ccb6f9265d99850debfe56f5bb0849ae9814a6d4 Author: Koji Kawamura <ijokarumawak@...> Date: 2018-12-05T06:03:21Z NIFI-5826 Fix back-slash escaping at Lexers ---- ---