[ https://issues.apache.org/jira/browse/NIFI-11945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753453#comment-17753453 ]
Peter Kimberley commented on NIFI-11945: ---------------------------------------- In the process of reviewing this processor, I identified the following additional problems, which are resolved in the referenced PR: h2. Other issues resolved # Expression language attributes `field.name`, `field.value` and `field.type` are referenced in the documentation but not implemented. This can be confusing for users of this processor. These attributes are removed in favour of a simpler `RecordPath` syntax in dynamic properties # Typos and confusing documentation (e.g. saying duplication only works on a per-file basis in one area, while contradicting this in another) # Reliance on map cache values to be put separately. This is non-atomic, so is not safe when run using multiple workers. Now using the `DistributedMapCacheClient::putIfAbsent()` method to achieve atomicity # NPE when attempting to reference a non-existent record field or one with a value of `null`. Added handling to treat this as an empty string. # Hash set filter code path was never reachable due to incorrect equality check h2. Other minor changes # Removed redundant classes and constants # Improved test coverage # Extracted repeated strings as constant members > DeduplicateRecord does not add keys to distributed map cache > ------------------------------------------------------------ > > Key: NIFI-11945 > URL: https://issues.apache.org/jira/browse/NIFI-11945 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework > Affects Versions: 1.23.0 > Environment: Docker > Reporter: Peter Kimberley > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The `DeduplicateRecord` processor supports the use of a distributed map cache > (DMC). > After generating the record key, it checks for the existence of that key in > the cache. It then calls `DistributedMapCacheClientWrapper::put()`, which in > this case, is a noop. Therefore, a cache entry is never written and records > are always routed to the `non-duplicate` relationship. > The correct behaviour would be for > `DistributedMapCacheClientWrapper:contains()` to call > `DistributedMapCacheClient::putIfAbsent()`, which would atomically check/set > the key in the target cache. > An additional problem is a NPE where a DMC is used and the > `DeduplicateRecord` property `Record Hashing Algorithm` is set to `NONE`. -- This message was sent by Atlassian Jira (v8.20.10#820010)