[RAT][DISCUSS] Process Archive Files
Greetings, The code base has the ability to read archive files. It is only used to create a "walker" to read archives passed in on the command line. I propose that we modify the processing of ARCHIVE type files to scan them for licences. *Proposal:* What I propose is that we extract each file in the ARCHIVE as a Document and process it. Any results from processing the Document will be added to the archive's Document instance. So any licenses found in files within the archive are reported as licenses for the archive. Processing of archives will exclude files listed in the filesToIgnore ReportConfiguration property. Processing of archives will NOT exclude directories listed in the directoriesToIgnore ReportConfiguration property *Backward Compatibility:* To keep this from breaking existing Rat execution I propose a new configuration option and an enumeration of values for that option. enum Processing { NOTIFICATION, PRESENCE, ABSENCE} NOTIFICATION - The default. The current level of reporting where we just count the archives and list them in the report. No internal processing of the archive. PRESENCE - Report the presence of any licenses found in the archive. In this case we ignore any UNKNOWN license entries and only report the licenses found. ABSENCE - like PRESENCE but adding the reporting of UNKNOWN licenses. the command line option "--archive" will be used to set the property, the value of the property is not case sensitive. *Examples:* "--archive NOTIFICATION" will execute exactly as it RAT does now "--archive Presence" will report any known licenses found in the archive. "--archive absence" will report presence of any licenses found as well as detection of files without licenses. *XML output changes:* Currently archives are listed in the XML output as: " This proposal would, in cases where licenses are discovered, add the license entries as with STANDARD resource types. For example: " *POC*: I have a POC that implements minor changes to the tika based code base. The changes are modified: apache-rat-core/src/main/java/org/apache/rat/Report.java modified: apache-rat-core/src/main/java/org/apache/rat/analysis/DefaultAnalyserFactory.java modified: apache-rat-core/src/main/java/org/apache/rat/report/xml/XmlReportFactory.java modified: apache-rat-core/src/main/java/org/apache/rat/walker/ArchiveWalker.java modified: apache-rat-core/src/test/java/org/apache/rat/analysis/DefaultAnalyserFactoryTest.java The changes include changing ArchiveWalker code to use more current commons-compress capabilities for archived type detection and reading. Other changes are to support additional method arguments. Thoughts? Claude -- LinkedIn: http://www.linkedin.com/in/claudewarren
Re: [PR] WIP: RAT-369: Add spotbugs to build and generate a report [creadur-rat]
Claudenw commented on code in PR #238: URL: https://github.com/apache/creadur-rat/pull/238#discussion_r1587463735 ## apache-rat-core/src/main/java/org/apache/rat/config/parameters/ComponentType.java: ## @@ -28,6 +28,6 @@ public enum ComponentType { MATCHER, /** A Parameter for example the "id" parameter found in every component */ PARAMETER, -/** A parameter that is supplied by the environment. Currently systems using builders have to handle seting this. For example the list of matchers for the "MatcherRefBuilder" */ -BULID_PARAMETER +/** A parameter that is supplied by the environment. Currently systems using builders have to handle setting this. For example the list of matchers for the "MatcherRefBuilder" */ +BUILD_PARAMETER Review Comment: Thanks for catching this. ## apache-rat-plugin/src/main/java/org/apache/rat/mp/util/ignore/GlobIgnoreMatcher.java: ## @@ -25,11 +25,7 @@ import java.io.File; import java.io.FileReader; import java.io.IOException; -import java.util.ArrayList; -import java.util.Arrays; -import java.util.Collection; -import java.util.List; -import java.util.Optional; +import java.util.*; Review Comment: Do not use '*" format for includes. If you are using IntelliJ you can specify a large number of imports in the same package before * to stop this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@creadur.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org