Currently Rat does no inspection of archives. This means that a jar that does not meet the licensing of a project could be included and would not be detected.
Currently the DefaultAnalyser simply marks the archives and archives and does nothing more with them. Under the proposed Tika change this has not changed, but we do have better identification of archives. I would like to see the DefaultAnalyser open the archives and process the contents via what is essentially another default analyser instance. The idea is that the result of scanning the contents of the archive will be reported as the scan of the jar itself. So if it has 3 licenses the report for the archive itself will state that it has the licenses. Tika can provide a hashes of files. I suggest we use those to track files that have already been processed, so if an archive is found 2x we report the first one with the licenses and such and the second as a duplicate of the first. I think we should add the hashes to the XML report as properties of the resource element describing the file. I also think that we should add the hashes as properties of the resource element. The hashes can be useful in exploring SBOM entries and similar. Thoughts? Claude