Github user patricker commented on the issue:

    https://github.com/apache/nifi/pull/883
  
    Matt,  'ascii' doesn't quite mean what it apperas :smile: .
    
    If you convert an array of random bytes to UTF8 and then back to bytes you 
will find you have considerably more bytes then you started with.  This is 
because in order to represent certain bytes as actual characters UTF8 has to 
insert extra marker bytes; then when you convert back to bytes these extra 
bytes come back too.  I've see the conversion of binary data -> UTF8 -> binary 
data grow by 40%.
    
    The key thing to remember for this processor is that the data coming in 
only looks like text because it is contained in an attribute, it's not actually 
text, it's raw bytes. In this processor 'ascii' means that each character is 
represents a single byte; calling string.getBytes("ASCII") is just a handy 
shortcut in Java to get this functionality.
    
    I can rename it to 'onecharperbyte' if that makes more sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to