Jon Zeolla created METRON-1052:
----------------------------------

             Summary: Add forensic similarity hash functions to Stellar
                 Key: METRON-1052
                 URL: https://issues.apache.org/jira/browse/METRON-1052
             Project: Metron
          Issue Type: Improvement
            Reporter: Jon Zeolla


This is a follow-on to METRON-539.  Currently we have Stellar functions to 
perform cryptographic hashing operations.  It would be useful to expand this to 
support forensic similarity hash functions so we could compare the similarity 
of inputs.  I can see two main components of this, and one secondary/lower 
priority thought:

(1) Support of LSH and/or CCTP hash functions (aka forensic similarity hash 
functions) such as sdhash or spamsum/ssdeep.  I quickly found some code 
examples[1][2] in Java that have compatible licenses, in case that is appealing.
(2) An approximate string matching function to establish a similarity rating 
between n hashes.  ssdeep currently has this via its -x and -k options, and 
there are some other thoughts[3] on how to best do this, but I'm aware there 
are numerous ways that we may want to consider comparing strings for similarity 
(damerau-levenshtein distance, longest common subsequence, etc.).  
(3) Similar to 2, I could see some applicability as a streaming enrichment, but 
as a native feature this would be a much lower priority/potentially a separate 
PR.

1:  
https://github.com/pcbje/autopsy-ahbm/blob/master/src/com/pcbje/ahbm/Sdhash.java
2:  https://github.com/tdebatty/java-spamsum
3:  
https://www.virusbulletin.com/virusbulletin/2015/11/optimizing-ssdeep-use-scale



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to