The hash-based deduplication strategy used the built-in "md5" attribute to offload the work to the database. That functionality was deprecated and AFAICT gone as of Mongo 5:
https://www.mongodb.com/docs/manual/core/gridfs/#files.md5 I am proposing two changes: * Remove deduplication * Create a MongoDB DistributedMapCache client that can query on the file metadata since GridFS stores metadata separately from chunks making lookups that way cheap and flexible. I could easily add that to this PR which already covers Testcontainers integration, making it super easy to test the changed behavior: https://github.com/apache/nifi/pull/6460 Thoughts?