Github user patricker commented on the issue: https://github.com/apache/nifi/pull/883 Matt, 'ascii' doesn't quite mean what it apperas :smile: . If you convert an array of random bytes to UTF8 and then back to bytes you will find you have considerably more bytes then you started with. This is because in order to represent certain bytes as actual characters UTF8 has to insert extra marker bytes; then when you convert back to bytes these extra bytes come back too. I've see the conversion of binary data -> UTF8 -> binary data grow by 40%. The key thing to remember for this processor is that the data coming in only looks like text because it is contained in an attribute, it's not actually text, it's raw bytes. In this processor 'ascii' means that each character is represents a single byte; calling string.getBytes("ASCII") is just a handy shortcut in Java to get this functionality. I can rename it to 'onecharperbyte' if that makes more sense.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---