markap14 commented on issue #3850: NIFI-6398 Added the 'replace first' and 
'replace all' strategy to ReplaceText
URL: https://github.com/apache/nifi/pull/3850#issuecomment-547131105
 
 
   @HorizonNet can you explain the difference between "Replace All" and 
"Regex"? I believe they are intended to accomplish the same thing, but the 
existing Regex allows the user to change the behavior between Line-by-Line vs. 
Entire Text. Using the Line-by-Line mode is preferred if the Regex does not 
span multiple lines because it uses dramatically less heap. It also allows for 
back references, etc. Not sure if the new "Replace All" provides a capability 
that isn't currently supported, that I'm just missing?
   
   The Replace First is a nice addition. We should ensure, though, any time 
that we create a `String` from `byte[]` that we pass in the character set in 
the constructor. It looks like it's used when serializing the `String` back to 
the `byte[]` but not when creating the `String` to begin with.
   
   Finally, rather than using `IOUtils.toByteArray()`, would recommend creating 
an `byte[]` and then using `StringUtils.fillBuffer`, as is done in the 
`RegexReplace` strategy. Because we already know the size of the FlowFile, this 
is far more efficient (because it doesn't have to keep filling a buffer, 
creating a new one, and copying bytes over) and can cut down the amount of heap 
used to store the buffered data by up to 50% (because the ByteArrayOutputStream 
doubles the buffer size every time it runs out of space, so if it's already 512 
KB and we need one more byte it creates a 1 MB buffer just to use 512 KB + 1 
byte, for example, whereas a direct allocation takes exactly the right number 
of bytes).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to