GitHub user ijokarumawak opened a pull request:

    https://github.com/apache/nifi/pull/3200

    NIFI-5826 WIP Fix back-slash escaping at Lexers

    ## Summary
    Current Lexers convert a back-slash plus another character sequence (e.g. 
'\[') to double back-slash plus the next character (e.g. '\\[').
    But from detailed analysis (see below), it seems the conversion is wrong 
and it should leave such characters as it is. 
    
    ## Details
    I debugged how Lexer works, and found that:
    
    - The `ESC` fragment handles an escaped special character in String 
representation. I.e. String `\t` will be converted to actual tab character.
    - The string values user input from NiFi UI are passed to 
`RecordPath.compile` method as it is. E.g. the input string 
`replaceRegex(/name, '\[', '')` is passed to as is, then the single back-slash 
is converted to double back-slash by the ESC fragment line 155.
    - I believe the line 153-156 is aimed to preserve escaped characters as it 
is, because such escape doesn't mean anything for the RecordPath/AttrExpLang 
spec. And those should be unescaped later by underlying syntaxes such as RegEx.
        - And current line 155 does it wrongly. It should append a single 
back-slash..
        - Other Lexers (AttributeExpressionLexer.g and HL7QueryLexer.g) have 
the same issue.
    - So, I think we should fix all Lexers instead of adding another conversion.
    
    Here is the [Lexer 
code](https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record-path/src/main/antlr3/org/apache/nifi/record/path/RecordPathLexer.g#L143)
 for reference:
    ```
    143 fragment
    144 ESC
    145   :  '\\'
    146     (
    147         '"'    { setText("\""); }
    148       |  '\''  { setText("\'"); }
    149       |  'r'   { setText("\r"); }
    150       |  'n'   { setText("\n"); }
    151       |  't'   { setText("\t"); }
    152       |  '\\'  { setText("\\\\"); }
    153       |  nextChar = ~('"' | '\'' | 'r' | 'n' | 't' | '\\')
    154        {
    155          StringBuilder lBuf = new StringBuilder(); 
lBuf.append("\\\\").appendCodePoint(nextChar); setText(lBuf.toString());
    156        }
    157    )
    158  ;
    ```
    
    ## NiFi template for test
    
    Here is a NiFi flow template to test how before/after this change.
    https://gist.github.com/ijokarumawak/b6bdca8074a4457bc4a425b90a6b17f0
    
    In order to try the template, you need to build this PR as NiFi 
1.9.0-SNAPSHOT, then download following 1.8.0 nars in your SNAPSHOT's lib dir, 
so that both versions can be used in the flow.
    
    - 
https://repo.maven.apache.org/maven2/org/apache/nifi/nifi-standard-nar/1.8.0/nifi-standard-nar-1.8.0.nar
    - 
https://repo.maven.apache.org/maven2/org/apache/nifi/nifi-update-attribute-nar/1.8.0/nifi-update-attribute-nar-1.8.0.nar
    
    ## Test result
    
    ### UpdateAttribute test for backward compatibility
    
    
![image](https://user-images.githubusercontent.com/1107620/49493740-93078700-f8a0-11e8-9360-025254b39551.png)
    
    GenerateFlowFile generates FlowFiles with attribute `a` whose value is:
    ```
    this is new line
    and this is just a backslash \n
    ```
    
    
![image](https://user-images.githubusercontent.com/1107620/49493751-9c90ef00-f8a0-11e8-8d41-6005f01157a7.png)
    
    Result
    1.8.0
    
![image](https://user-images.githubusercontent.com/1107620/49493779-b5010980-f8a0-11e8-9911-22c0d71e865b.png)
    
    1.9.0-SNAPSHOT
    
![image](https://user-images.githubusercontent.com/1107620/49493786-baf6ea80-f8a0-11e8-8c04-7efb54167345.png)
    
    ### UpdateRecord test illustrating the NIFI-5826 issue is addressed
    
![image](https://user-images.githubusercontent.com/1107620/49493825-e083f400-f8a0-11e8-8b4a-cf17e370282e.png)
    
    GenerateFlowFile generates content:
    ```
    key,value
    on[e,1
    [two,2
    ```
    
    
![image](https://user-images.githubusercontent.com/1107620/49493836-e8439880-f8a0-11e8-8e89-f07db2712690.png)
    
    Result
    1.8.0
    Regex compilation error as reported
    
![image](https://user-images.githubusercontent.com/1107620/49493844-f09bd380-f8a0-11e8-822f-f747f3184fc5.png)
    
    1.9.0-SNAPSHOT
    The square brackets are converted successfully
    
![image](https://user-images.githubusercontent.com/1107620/49493863-03160d00-f8a1-11e8-8f11-ac95670412ed.png)
    
    
    ---
    
    
    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [ ] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?
    
    - [ ] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [ ] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
    - [ ] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
    - [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
    - [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ijokarumawak/nifi NIFI-5826

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/3200.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3200
    
----
commit ccb6f9265d99850debfe56f5bb0849ae9814a6d4
Author: Koji Kawamura <ijokarumawak@...>
Date:   2018-12-05T06:03:21Z

    NIFI-5826 Fix back-slash escaping at Lexers

----


---

Reply via email to