Following up on this, I am not sure replace is needed at all. ReplaceAll, if called with a “regular expression” that is just a literal String, will perform the same operation as replace would.
In NIFI-1919, I will not remove replace, as that would break BC, but I submit that for 1.0, we should remove it, as it’s behavior is completely covered by another method. We would then have: replaceFirst - accepts a literal or regex replaceAll - accepts a literal or regex Again, we could perform “flow upgrade” through the StandardFlowSynchronizer to migrate all uses of “replace” to “replaceAll". Thoughts? Andy LoPresto alopre...@apache.org alopresto.apa...@gmail.com PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On May 26, 2016, at 10:28 AM, Joe Skora <jsk...@gmail.com> wrote: > > Got it. That makes sense and provides all the variations of functionality > so it sounds like a win! > > On Thu, May 26, 2016 at 1:21 PM, Andy LoPresto <alopre...@apache.org > <mailto:alopre...@apache.org>> wrote: > >> Hi Joe, >> >> I think what you proposed is more than is necessary. replaceAll is >> functionally complete already, as if it is passed a literal expression, it >> will compile it as such and replace all instances in the subject string. I >> think the confusing point is that the difference between replace and >> replaceAll is not the number of times the replacement is attempted, but >> rather the literal vs. regular expression compilation. As Joe Percivall >> suggested, I think three methods (replace(literal), >> replaceFirst(regex/literal), and replaceAll(regex/literal)) are sufficient. >> >> Having the replaceRegex and replaceLiteral functions with an additional >> parameter to control occurrence count would be difficult, as we would >> either need to expose some kind of enum or perform string matching to >> interpret user-entered flag values. >> >> I don’t think replace and replaceAll need to be EOL-ed, I think the >> documentation just needs to call out the behavior very explicitly, and I >> believe having a new option replaceFirst will also help to suggest the >> differences between the methods even if users do not read the entire >> documentation. >> >> >> Andy LoPresto >> alopre...@apache.org >> *alopresto.apa...@gmail.com <mailto:alopresto.apa...@gmail.com> >> <alopresto.apa...@gmail.com <mailto:alopresto.apa...@gmail.com>>* >> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 >> >> On May 26, 2016, at 10:09 AM, Joe Skora <jsk...@gmail.com >> <mailto:jsk...@gmail.com>> wrote: >> >> I think there is transitional path that is non-breaking and could prevent >> 0.x to 1.x migration issues, even if at the expense of a little extra code >> for a period of time. >> >> 1. Leave the existing "replace" and "replaceAll" functions as is for >> now. >> 2. Implement the new "replaceRegex" and "replaceLiteral" functions. I'd >> prefer giving them a parameter selecting first, last, or all occurrences >> over function variants, it could even be optional with all as the >> default. >> 3. In a future release deprecate the "replace" and "replaceAll" >> functions. >> 4. In a future, future release remove the "replace" and "replaceAll" >> >> functions. >> >> From past experience managing enterprise systems, users preferred a >> transition period like this instead of requiring everything be updated at >> once. We could then provide them a means to find any outstanding use of >> the old functionality and clean it up before it was retired. And from the >> system development and management perspective, that saved us a lot of pain >> too since we didn't have an influx of minor but troublesome issues in the >> midst transitions that could demand our attention be on other bigger >> problems. >> >> It would be possible to have a signle function that takes another parameter >> indicating whether the target is a literal or regular expression, but I >> like the separate functions, especially if they will be implemented with >> different underlying APIs. >> >> On Wed, May 25, 2016 at 10:43 PM, Andy LoPresto <alopre...@apache.org >> <mailto:alopre...@apache.org>> >> wrote: >> >> Joe, >> >> These would be breaking changes and a lot of existing workflows would >> begin to behave differently. I would suggest making an incremental change >> here — simply adding replaceFirst as a non-destructive change as a solution >> for this issue, and opening a new Jira for the changes which break backward >> compatibility. >> >> I do agree that option 1 is probably the cleaner way forward, and if we >> introduce new method names, we may be able to use StandardFlowSynchronizer >> to detect legacy methods from a pre-1.0 flow and update them automatically >> during flow migration. >> >> Andy LoPresto >> alopre...@apache.org <mailto:alopre...@apache.org> >> *alopresto.apa...@gmail.com <mailto:alopresto.apa...@gmail.com> >> <alopresto.apa...@gmail.com <mailto:alopresto.apa...@gmail.com>>* >> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 >> >> On May 25, 2016, at 4:06 PM, Joe Percivall <joeperciv...@yahoo.com.INVALID >> <mailto:joeperciv...@yahoo.com.invalid> >> <joeperciv...@yahoo.com.invalid <mailto:joeperciv...@yahoo.com.invalid>> >> <joeperciv...@yahoo.com.invalid <mailto:joeperciv...@yahoo.com.invalid>>> >> wrote: >> >> Andy, >> >> Nice write-up and thanks for bringing attention to this. I definitely >> assumed for a while that replace vs replaceAll was the number of things >> replaced. The underlying problem, I think, is that these EL methods are >> just wrappers around the Java String methods and the Java String methods >> are named in a confusing manner. >> >> I am on board with adding a "replaceFirst(regex, replacement)" method. >> This adds a bit more functionality and is just exposing another Java String >> method. >> >> In addition to that, I would suggest doing something to alleviate the >> confusion between "replace" and "replaceAll". In a similar fashion to >> adding decimal support, I see two avenues we could take: >> >> 1. Change the names of the functions to "replaceLiteral" and >> "replaceRegex" (or "replaceAllLiteral" and "replaceAllRegex") >> 2. Remove one of the methods and add a third field to the remaining method >> to indicate whether to replace literally or interpret as a regex >> >> Both of these would be breaking changes and introduced with 1.0 and I am >> leaning towards option 1 with the base name "replace". I believe when the >> "replaceFirst" method is added, "replaceLiteral" and "replaceRegex" would >> be easy to understand that they replace all occurrences. >> >> Joe >> >> - - - - - - >> Joseph Percivall >> linkedin.com/in/Percivall <http://linkedin.com/in/Percivall> >> e: joeperciv...@yahoo.com <mailto:joeperciv...@yahoo.com> >> >> >> >> On Wednesday, May 25, 2016 6:24 PM, Andy LoPresto <alopre...@apache.org >> <mailto:alopre...@apache.org>> >> wrote: >> >> >> >> Hi all, >> >> During investigation of an expression language issue posted to the list, I >> discovered that replace explicitly delegates to a String#replace invocation >> that only accepts literal expressions, not regular expressions, while >> replaceAll accepts regular expressions. I thought this was an oversight and >> filed NIFI-1919 [1] to document and fix this, by changing the >> ReplaceEvaluator [2] to use String#replaceFirst, which accepts regular >> expressions. I wrote failing unit tests [3] to capture the fix. After >> implementing the change, two existing unit tests [4] broke, which indicated >> a regression. At first, I believed these two tests to be incorrect, but >> further investigation showed they were merely surprising. >> >> TestQuery#testQuotingQuotes (below) fails on the second verifyEquals call, >> but the test is asserting that replace should replace multiple instances of >> the single quote. While this is similar to String#replace, because the >> expression language exposes only two methods — replace vs. replaceAll — one >> could easily assume the difference between the two was the number of >> attempted replacements, rather than the actual difference, which is literal >> expression vs. pattern. >> >> @Test >> public void testQuotingQuotes() { >> final Map<String, String> attributes = new HashMap<>(); >> attributes.put("xx", "say 'hi'"); >> >> String query = "${xx:replaceAll( \"'.*'\", '\\\"hello\\\"' )}"; >> verifyEquals(query, attributes, "say \"hello\""); >> >> query = "${xx:replace( \"'\", '\"')}"; >> verifyEquals(query, attributes, "say \"hi\""); >> >> query = "${xx:replace( '\\'' <smb://''>, '\"')}"; >> System.out.println(query); >> verifyEquals(query, attributes, "say \"hi\""); >> } >> TestQuery#testReplaceAllWithOddNumberOfBackslashPairs (below) also fails >> on the first verifyEquals call with a PatternSyntaxException. I am >> investigating that further. >> >> @Test >> public void testReplaceAllWithOddNumberOfBackslashPairs() { >> final Map<String, String> attributes = new HashMap<>(); >> attributes.put("filename", "C:\\temp\\.txt"); >> >> verifyEquals("${filename:replace('\\\\', '/')}", attributes, >> "C:/temp/.txt"); >> verifyEquals("${filename:replaceAll('\\\\\\\\', '/')}", attributes, >> "C:/temp/.txt"); >> verifyEquals("${filename:replaceAll('\\\\\\.txt$', '')}", attributes, >> "C:\\temp"); >> } >> While I originally had just modified replace, after looking at the EL >> documentation [5], replace is explicitly documented to only replace literal >> expressions, and does so multiple times, as does Java’s String#replace [6]. >> I now propose to add another method replaceFirst, which accepts a pattern >> and replaces only the first occurrence. I will update the unit tests to >> properly capture this, and will update the documentation to reflect the new >> method. >> >> Thoughts from the community? >> >> [1] https://issues.apache.org/jira/browse/NIFI-1919 >> <https://issues.apache.org/jira/browse/NIFI-1919> >> [2] >> >> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java >> >> <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java> >> [3] >> >> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy >> >> <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy> >> [4] >> >> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java >> >> <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java> >> [5] >> >> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace >> >> <https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace> >> [6] >> >> https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence) >> >> <https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)> >> >> >> >> Andy LoPresto >> alopre...@apache.org <mailto:alopre...@apache.org> >> alopresto.apa...@gmail.com <mailto:alopresto.apa...@gmail.com> >> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69
signature.asc
Description: Message signed with OpenPGP using GPGMail