Github user ijokarumawak commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/3183#discussion_r237764346
  
    --- Diff: 
nifi-commons/nifi-record-path/src/main/java/org/apache/nifi/record/path/util/RecordPathUtils.java
 ---
    @@ -39,4 +39,52 @@ public static String getFirstStringValue(final 
RecordPathSegment segment, final
     
             return stringValue;
         }
    +
    +    /**
    +     * This method handles backslash sequences after ANTLR parser converts 
all backslash into double ones
    +     * with exception for \t, \r and \n. See
    +     * <a 
href="file:../../../../../../../../../src/main/antlr3/org/apache/nifi/record/path/RecordPathParser.g">org/apache/nifi/record/path/RecordPathParser.g</a>
    --- End diff --
    
    @bdesert Thanks for the explanation. I read through your comment over and 
over, and tested different input string. However, that differs from what what 
I'm seeing.
    
    I debugged how Lexer works, too, and found that:
    
    - The `ESC` fragment handles an escaped special character in String 
representation. I.e. String `\t` will be converted to actual tab character.
    - The string values user input from NiFi UI are passed to 
`RecordPath.compile` method as it is. E.g. the input string 
`replaceRegex(/name, '\[', '')` is passed to as is, then the single back-slash 
is converted to double back-slash by the ESC fragment line 155.
    - Since the escaped characters such as `\n` are already unescaped by this 
Lexer, the `RecordPathUtils.unescapeBackslash` method added by this PR will not 
see any **escaped** characters other than the doubled back-slash `//`. And the 
doubled back-slash is converted back to a single back-slash by 
`RecordPathUtils.unescapeBackslash`.
    - I believe the line 153-156 is aimed to preserve escaped characters as it 
is, because such escape doesn't mean anything for the RecordPath/AttrExpLang 
spec. And those should be unescaped later by underlying syntaxes such as RegEx.
        - And current line 155 does it wrongly. It should append a single 
back-slash..
        - Other Lexers (AttributeExpressionLexer.g and HL7QueryLexer.g) have 
the same issue.
    - So, I think we should fix all Lexers instead of adding another conversion.
    
    Here is the [Lexer 
code](https://github.com/apache/nifi/blob/master/nifi-commons/nifi-record-path/src/main/antlr3/org/apache/nifi/record/path/RecordPathLexer.g#L143)
 for reference:
    ```
    143 fragment
    144 ESC
    145   :  '\\'
    146     (
    147         '"'    { setText("\""); }
    148       |  '\''  { setText("\'"); }
    149       |  'r'   { setText("\r"); }
    150       |  'n'   { setText("\n"); }
    151       |  't'   { setText("\t"); }
    152       |  '\\'  { setText("\\\\"); }
    153       |  nextChar = ~('"' | '\'' | 'r' | 'n' | 't' | '\\')
    154        {
    155          StringBuilder lBuf = new StringBuilder(); 
lBuf.append("\\\\").appendCodePoint(nextChar); setText(lBuf.toString());
    156        }
    157    )
    158  ;
    ```
    
    Can share any input String value that a user would set into a processor 
property text box (or passed to RecordPath.compile method), that will go the 
proposed `RecordPathUtils.unescapeBackslash` method lines of code for `\n`, 
`\r` or `\t`? Current test case does have regex test cases using those escaped 
characters, but coverage shows those branches are not used:
    
![image](https://user-images.githubusercontent.com/1107620/49273985-18162900-f4ba-11e8-8ac7-2600f8f8c698.png)



---

Reply via email to