Currently Rat does no inspection of archives.  This means that a jar that
does not meet the licensing of a project could be included and would not be
detected.

Currently the DefaultAnalyser simply marks the archives and archives and
does nothing more with them.  Under the proposed Tika change this has not
changed, but we do have better identification of archives.

I would like to see the DefaultAnalyser open the archives and process the
contents via what is essentially another default analyser instance.  The
idea is that the result of scanning the contents of the archive will be
reported as the scan of the jar itself.  So if it has 3 licenses the report
for the archive itself will state that it has the licenses.

Tika can provide a hashes of files.  I suggest we use those to track files
that have already been processed, so if an archive is found 2x we report
the first one with the licenses and such and the second as a duplicate of
the first.

I think we should add the hashes to the XML report as properties of the
resource element describing the file.

I also think that we should add the hashes as properties of the resource
element.  The hashes can be useful in exploring SBOM entries and similar.

Thoughts?

Claude

Reply via email to