[ https://issues.apache.org/jira/browse/NIFI-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598996#comment-16598996 ]
ASF GitHub Bot commented on NIFI-5147: -------------------------------------- Github user ottobackwards commented on the issue: https://github.com/apache/nifi/pull/2836 Ok, that is great. Did I miss something that _was_ in the jira? Should I just close this PR now then? > Improve HashAttribute processor > ------------------------------- > > Key: NIFI-5147 > URL: https://issues.apache.org/jira/browse/NIFI-5147 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions > Affects Versions: 1.6.0 > Reporter: Andy LoPresto > Assignee: Otto Fowler > Priority: Major > Labels: hash, security > Fix For: 1.8.0 > > > The {{HashAttribute}} processor currently has surprising behavior. Barring > familiarity with the processor, a user would expect {{HashAttribute}} to > generate a hash value over one or more attributes. Instead, the processor as > it is implemented "groups" incoming flowfiles into groups based on regular > expressions which match attribute values, and then generates a > (non-configurable) MD5 hash over the concatenation of the matching attribute > keys and values. > In addition: > * the processor throws an error and routes to failure any incoming flowfile > which does not have all attributes specified in the processor > * the use of MD5 is vastly deprecated > * no other hash algorithms are available > I am unaware of community use of this processor, but I do not want to break > backward compatibility. I propose the following steps: > * Implement a new {{CalculateAttributeHash}} processor (awkward name, but > this processor already has the desired name) > ** This processor will perform the "standard" use case -- identify an > attribute, calculate the specified hash over the value, and write it to an > output attribute > ** This processor will have a required property descriptor allowing a > dropdown menu of valid hash algorithms > ** This processor will accept arbitrary dynamic properties identifying the > attributes to be hashed as a key, and the resulting attribute name as a value > ** Example: I want to generate a SHA-512 hash on the attribute {{username}}, > and a flowfile enters the processor with {{username}} value {{alopresto}}. I > configure {{algorithm}} with {{SHA-512}} and add a dynamic property > {{username}} -- {{username_SHA512}}. The resulting flowfile will have > attribute {{username_SHA512}} with value > {{739b4f6722fb5de20125751c7a1a358b2a7eb8f07e530e4bf18561fbff93234908aa9d2577770c876bca9ede5ba784d5ce6081dbbdfe5ddd446678f223b8d632}} > * Improve the documentation of this processor to explain the goal/expected > use case (?) > * Link in processor documentation to new processor for standard use cases > * Remove the error alert when an incoming flowfile does not contain all > expected attributes. I propose changing the severity to INFO and still > routing to failure -- This message was sent by Atlassian JIRA (v7.6.3#76005)