Matt Burgess created NIFI-4550: ---------------------------------- Summary: Add an InferCharacterSet processor Key: NIFI-4550 URL: https://issues.apache.org/jira/browse/NIFI-4550 Project: Apache NiFi Issue Type: New Feature Components: Extensions Reporter: Matt Burgess Priority: Minor
Sometimes in a NiFi flow it is not known what character set an incoming flow file is using. This can make it difficult for downstream processing if the processors expect a particular charset (whether the user can configure it or not). There is a ConvertCharacterSet processor, but it expects an explicit value for Input Character Set, when this might not be known. I propose an InferCharacterSet processor, which would presumably use some license-friendly third-party library (there is a discussion [here|https://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream]) to guess the character set, perhaps adding it as an attribute for use downstream in ConvertCharacterSet. -- This message was sent by Atlassian JIRA (v6.4.14#64029)