Matt Burgess created NIFI-4550:
----------------------------------

             Summary: Add an InferCharacterSet processor
                 Key: NIFI-4550
                 URL: https://issues.apache.org/jira/browse/NIFI-4550
             Project: Apache NiFi
          Issue Type: New Feature
          Components: Extensions
            Reporter: Matt Burgess
            Priority: Minor


Sometimes in a NiFi flow it is not known what character set an incoming flow 
file is using. This can make it difficult for downstream processing if the 
processors expect a particular charset (whether the user can configure it or 
not). There is a ConvertCharacterSet processor, but it expects an explicit 
value for Input Character Set, when this might not be known.

I propose an InferCharacterSet processor, which would presumably use some 
license-friendly third-party library (there is a discussion 
[here|https://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream])
 to guess the character set, perhaps adding it as an attribute for use 
downstream in ConvertCharacterSet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to