[RAT][DISCUSS] Process Archive Files

2024-05-02 Thread Claude Warren
Greetings,

The code base has the ability to read archive files.  It is only used to
create a "walker" to read archives passed in on the command line.  I
propose that we modify the processing of ARCHIVE type files to scan them
for licences.

*Proposal:*
What I propose is that we extract each file in the ARCHIVE as a Document
and process it.  Any results from processing the Document will be added to
the archive's Document instance.  So any licenses found in files within the
archive are reported as licenses for the archive.

Processing of archives will exclude files listed in the filesToIgnore
ReportConfiguration property.
Processing of archives will NOT exclude directories listed in the
directoriesToIgnore ReportConfiguration property

*Backward Compatibility:*
To keep this from breaking existing Rat execution I propose a new
configuration option and an enumeration of values for that option.

enum Processing { NOTIFICATION, PRESENCE, ABSENCE}

NOTIFICATION - The default.  The current level of reporting where we just
count the archives and list them in the report.  No internal processing of
the archive.

PRESENCE - Report the presence of any licenses found in the archive.  In
this case we ignore any UNKNOWN license entries and only report the
licenses found.

ABSENCE - like PRESENCE but adding the reporting of UNKNOWN licenses.

the command line option "--archive" will be used to set the property, the
value of the property is not case sensitive.

*Examples:*
"--archive NOTIFICATION" will execute exactly as it RAT does now
"--archive Presence" will report any known licenses found in the archive.
"--archive absence" will report presence of any licenses found as well as
detection of files without licenses.

*XML output changes:*
Currently archives are listed in the XML output as:

"

This proposal would, in cases where licenses are discovered, add the
license entries as with STANDARD resource types.  For example:

"
  
  


*POC*:
I have a POC that implements minor changes to the tika based code base.
The changes are
modified:   apache-rat-core/src/main/java/org/apache/rat/Report.java
modified:
apache-rat-core/src/main/java/org/apache/rat/analysis/DefaultAnalyserFactory.java
modified:
apache-rat-core/src/main/java/org/apache/rat/report/xml/XmlReportFactory.java
modified:
apache-rat-core/src/main/java/org/apache/rat/walker/ArchiveWalker.java
modified:
apache-rat-core/src/test/java/org/apache/rat/analysis/DefaultAnalyserFactoryTest.java

The changes include changing ArchiveWalker code to use more current
commons-compress capabilities for archived type detection and reading.

Other changes are to support additional method arguments.

Thoughts?
Claude

-- 
LinkedIn: http://www.linkedin.com/in/claudewarren


Re: [PR] WIP: RAT-369: Add spotbugs to build and generate a report [creadur-rat]

2024-05-02 Thread via GitHub


Claudenw commented on code in PR #238:
URL: https://github.com/apache/creadur-rat/pull/238#discussion_r1587463735


##
apache-rat-core/src/main/java/org/apache/rat/config/parameters/ComponentType.java:
##
@@ -28,6 +28,6 @@ public enum ComponentType {
 MATCHER,
 /** A Parameter for example the "id" parameter found in every component */
 PARAMETER,
-/** A parameter that is supplied by the environment.  Currently systems 
using builders have to handle seting this.  For example the list of matchers 
for the "MatcherRefBuilder" */
-BULID_PARAMETER 
+/** A parameter that is supplied by the environment.  Currently systems 
using builders have to handle setting this.  For example the list of matchers 
for the "MatcherRefBuilder" */
+BUILD_PARAMETER

Review Comment:
   Thanks for catching this.



##
apache-rat-plugin/src/main/java/org/apache/rat/mp/util/ignore/GlobIgnoreMatcher.java:
##
@@ -25,11 +25,7 @@
 import java.io.File;
 import java.io.FileReader;
 import java.io.IOException;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.Collection;
-import java.util.List;
-import java.util.Optional;
+import java.util.*;

Review Comment:
   Do not use '*" format for includes.  If you are using IntelliJ you can 
specify a large number of imports in the same package before * to stop this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@creadur.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org