Hi Joe,

I think what you proposed is more than is necessary. replaceAll is functionally 
complete already, as if it is passed a literal expression, it will compile it 
as such and replace all instances in the subject string. I think the confusing 
point is that the difference between replace and replaceAll is not the number 
of times the replacement is attempted, but rather the literal vs. regular 
expression compilation. As Joe Percivall suggested, I think three methods 
(replace(literal), replaceFirst(regex/literal), and replaceAll(regex/literal)) 
are sufficient.

Having the replaceRegex and replaceLiteral functions with an additional 
parameter to control occurrence count would be difficult, as we would either 
need to expose some kind of enum or perform string matching to interpret 
user-entered flag values.

I don’t think replace and replaceAll need to be EOL-ed, I think the 
documentation just needs to call out the behavior very explicitly, and I 
believe having a new option replaceFirst will also help to suggest the 
differences between the methods even if users do not read the entire 
documentation.


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On May 26, 2016, at 10:09 AM, Joe Skora <jsk...@gmail.com> wrote:
> 
> I think there is transitional path that is non-breaking and could prevent
> 0.x to 1.x migration issues, even if at the expense of a little extra code
> for a period of time.
> 
>   1. Leave the existing "replace" and "replaceAll" functions as is for
>   now.
>   2. Implement the new "replaceRegex" and "replaceLiteral" functions.  I'd
>   prefer giving them a parameter selecting first, last, or all occurrences
>   over function variants, it could even be optional with all as the default.
>   3. In a future release deprecate the "replace" and "replaceAll"
>   functions.
>   4. In a future, future release remove the "replace" and "replaceAll"
>   functions.
> 
> From past experience managing enterprise systems, users preferred a
> transition period like this instead of requiring everything be updated at
> once.  We could then provide them a means to find any outstanding use of
> the old functionality and clean it up before it was retired.  And from the
> system development and management perspective, that saved us a lot of pain
> too since we didn't have an influx of minor but troublesome issues in the
> midst transitions that could demand our attention be on other bigger
> problems.
> 
> It would be possible to have a signle function that takes another parameter
> indicating whether the target is a literal or regular expression, but I
> like the separate functions, especially if they will be implemented with
> different underlying APIs.
> 
> On Wed, May 25, 2016 at 10:43 PM, Andy LoPresto <alopre...@apache.org 
> <mailto:alopre...@apache.org>>
> wrote:
> 
>> Joe,
>> 
>> These would be breaking changes and a lot of existing workflows would
>> begin to behave differently. I would suggest making an incremental change
>> here — simply adding replaceFirst as a non-destructive change as a solution
>> for this issue, and opening a new Jira for the changes which break backward
>> compatibility.
>> 
>> I do agree that option 1 is probably the cleaner way forward, and if we
>> introduce new method names, we may be able to use StandardFlowSynchronizer
>> to detect legacy methods from a pre-1.0 flow and update them automatically
>> during flow migration.
>> 
>> Andy LoPresto
>> alopre...@apache.org
>> *alopresto.apa...@gmail.com <mailto:alopresto.apa...@gmail.com> 
>> <alopresto.apa...@gmail.com <mailto:alopresto.apa...@gmail.com>>*
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>> 
>> On May 25, 2016, at 4:06 PM, Joe Percivall <joeperciv...@yahoo.com.INVALID 
>> <mailto:joeperciv...@yahoo.com.invalid>
>> <joeperciv...@yahoo.com.invalid <mailto:joeperciv...@yahoo.com.invalid>>> 
>> wrote:
>> 
>> Andy,
>> 
>> Nice write-up and thanks for bringing attention to this. I definitely
>> assumed for a while that replace vs replaceAll was the number of things
>> replaced. The underlying problem, I think, is that these EL methods are
>> just wrappers around the Java String methods and the Java String methods
>> are named in a confusing manner.
>> 
>> I am on board with adding a "replaceFirst(regex, replacement)" method.
>> This adds a bit more functionality and is just exposing another Java String
>> method.
>> 
>> In addition to that, I would suggest doing something to alleviate the
>> confusion between "replace" and "replaceAll". In a similar fashion to
>> adding decimal support, I see two avenues we could take:
>> 
>> 1. Change the names of the functions to "replaceLiteral" and
>> "replaceRegex" (or "replaceAllLiteral" and "replaceAllRegex")
>> 2. Remove one of the methods and add a third field to the remaining method
>> to indicate whether to replace literally or interpret as a regex
>> 
>> Both of these would be breaking changes and introduced with 1.0 and I am
>> leaning towards option 1 with the base name "replace". I believe when the
>> "replaceFirst" method is added, "replaceLiteral" and "replaceRegex" would
>> be easy to understand that they replace all occurrences.
>> 
>> Joe
>> 
>> - - - - - -
>> Joseph Percivall
>> linkedin.com/in/Percivall <http://linkedin.com/in/Percivall>
>> e: joeperciv...@yahoo.com <mailto:joeperciv...@yahoo.com>
>> 
>> 
>> 
>> On Wednesday, May 25, 2016 6:24 PM, Andy LoPresto <alopre...@apache.org 
>> <mailto:alopre...@apache.org>>
>> wrote:
>> 
>> 
>> 
>> Hi all,
>> 
>> During investigation of an expression language issue posted to the list, I
>> discovered that replace explicitly delegates to a String#replace invocation
>> that only accepts literal expressions, not regular expressions, while
>> replaceAll accepts regular expressions. I thought this was an oversight and
>> filed NIFI-1919 [1] to document and fix this, by changing the
>> ReplaceEvaluator [2] to use String#replaceFirst, which accepts regular
>> expressions. I wrote failing unit tests [3] to capture the fix. After
>> implementing the change, two existing unit tests [4] broke, which indicated
>> a regression. At first, I believed these two tests to be incorrect, but
>> further investigation showed they were merely surprising.
>> 
>> TestQuery#testQuotingQuotes (below) fails on the second verifyEquals call,
>> but the test is asserting that replace should replace multiple instances of
>> the single quote. While this is similar to String#replace, because the
>> expression language exposes only two methods — replace vs. replaceAll — one
>> could easily assume the difference between the two was the number of
>> attempted replacements, rather than the actual difference, which is literal
>> expression vs. pattern.
>> 
>> @Test
>> public void testQuotingQuotes() {
>> final Map<String, String> attributes = new HashMap<>();
>> attributes.put("xx", "say 'hi'");
>> 
>> String query = "${xx:replaceAll( \"'.*'\", '\\\"hello\\\"' )}";
>> verifyEquals(query, attributes, "say \"hello\"");
>> 
>> query = "${xx:replace( \"'\", '\"')}";
>> verifyEquals(query, attributes, "say \"hi\"");
>> 
>> query = "${xx:replace( '\\'' <smb://''>, '\"')}";
>> System.out.println(query);
>> verifyEquals(query, attributes, "say \"hi\"");
>> }
>> TestQuery#testReplaceAllWithOddNumberOfBackslashPairs (below) also fails
>> on the first verifyEquals call with a PatternSyntaxException. I am
>> investigating that further.
>> 
>> @Test
>> public void testReplaceAllWithOddNumberOfBackslashPairs() {
>> final Map<String, String> attributes = new HashMap<>();
>> attributes.put("filename", "C:\\temp\\.txt");
>> 
>> verifyEquals("${filename:replace('\\\\', '/')}", attributes,
>> "C:/temp/.txt");
>> verifyEquals("${filename:replaceAll('\\\\\\\\', '/')}", attributes,
>> "C:/temp/.txt");
>> verifyEquals("${filename:replaceAll('\\\\\\.txt$', '')}", attributes,
>> "C:\\temp");
>> }
>> While I originally had just modified replace, after looking at the EL
>> documentation [5], replace is explicitly documented to only replace literal
>> expressions, and does so multiple times, as does Java’s String#replace [6].
>> I now propose to add another method replaceFirst, which accepts a pattern
>> and replaces only the first occurrence. I will update the unit tests to
>> properly capture this, and will update the documentation to reflect the new
>> method.
>> 
>> Thoughts from the community?
>> 
>> [1] https://issues.apache.org/jira/browse/NIFI-1919 
>> <https://issues.apache.org/jira/browse/NIFI-1919>
>> [2]
>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java
>>  
>> <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java>
>> [3]
>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy
>>  
>> <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy>
>> [4]
>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java
>>  
>> <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java>
>> [5]
>> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace
>>  
>> <https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace>
>> [6]
>> https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)
>>  
>> <https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)>
>> 
>> 
>> 
>> Andy LoPresto
>> alopre...@apache.org <mailto:alopre...@apache.org>
>> alopresto.apa...@gmail.com <mailto:alopresto.apa...@gmail.com>
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to