Jon Zeolla created METRON-1052:
----------------------------------
Summary: Add forensic similarity hash functions to Stellar
Key: METRON-1052
URL: https://issues.apache.org/jira/browse/METRON-1052
Project: Metron
Issue Type: Improvement
Reporter: Jon Zeolla
This is a follow-on to METRON-539. Currently we have Stellar functions to
perform cryptographic hashing operations. It would be useful to expand this to
support forensic similarity hash functions so we could compare the similarity
of inputs. I can see two main components of this, and one secondary/lower
priority thought:
(1) Support of LSH and/or CCTP hash functions (aka forensic similarity hash
functions) such as sdhash or spamsum/ssdeep. I quickly found some code
examples[1][2] in Java that have compatible licenses, in case that is appealing.
(2) An approximate string matching function to establish a similarity rating
between n hashes. ssdeep currently has this via its -x and -k options, and
there are some other thoughts[3] on how to best do this, but I'm aware there
are numerous ways that we may want to consider comparing strings for similarity
(damerau-levenshtein distance, longest common subsequence, etc.).
(3) Similar to 2, I could see some applicability as a streaming enrichment, but
as a native feature this would be a much lower priority/potentially a separate
PR.
1:
https://github.com/pcbje/autopsy-ahbm/blob/master/src/com/pcbje/ahbm/Sdhash.java
2: https://github.com/tdebatty/java-spamsum
3:
https://www.virusbulletin.com/virusbulletin/2015/11/optimizing-ssdeep-use-scale
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)