I changed the standard report slightly. It now looks like:
s src/test/resources/elements/ILoggerFactory.java MIT MIT The MIT License b src/test/resources/elements/Image.png n src/test/resources/elements/LICENSE n src/test/resources/elements/NOTICE !s src/test/resources/elements/Source.java ????? ????? Unknown license s src/test/resources/elements/Text.txt AL AL Apache License Version 2.0 s src/test/resources/elements/TextHttps.txt AL AL Apache License Version 2.0 s src/test/resources/elements/Xml.xml AL AL Apache License Version 2.0 s src/test/resources/elements/buildr.rb AL AL Apache License Version 2.0 a src/test/resources/elements/dummy.jar g src/test/resources/elements/generated.txt GEN GEN Generated Files b src/test/resources/elements/plain.json s src/test/resources/elements/tri.txt AL AL Apache License Version 2.0 BSD-3 BSD-3 BSD 3 clause BSD-3 TMF The Telemanagement Forum License !s src/test/resources/elements/sub/Empty.txt ????? ????? Unknown license ***************************************************** On Fri, Mar 29, 2024 at 3:50 PM Claude Warren <cla...@xenei.com> wrote: > I have a proposed change. See > https://github.com/Claudenw/creadur-rat/pull/6/files > Note that this pull request is the difference between multiple targets and > the change to move to RAT-366 (Move to single matche call) > > Example output in > https://github.com/Claudenw/creadur-rat/tree/Multiple_license_report/apache-rat/src/site/examples > > I reworked the MetaData class and removed all the funky naming. All we > really needed to capture for a document is what licenses matched and which > of those are approved licenses. > > The new rat report (in examples) has a "resource" element for each file > that was checked. The resource still has a name attribute and I added a > type attribute that specifies the type of file that it is (e.g. archive, > standard, binary). It has two possible child elements "license" and > "sample" > > The license element has several attributes: approval, family, id, and name > A license can have a notes child element that contains the notes for the > license. These are not usually displayed but are included for the > generated files license. > > The sample element contains text from the license. It is only included > when the license type is unknown. > > The sample and notes text is enclosed in a CDATA block. > > I reworked the standard report. This is probably a breaking change for > anyone who is parsing the text, but then they should be using a custom xslt > to extract the info they want. > > The new report looks like: > > > ***************************************************** > Summary > ------- > Generated at: 2024-03-29T15:01:24+01:00 > > Notes: 2 > Binaries: 2 > Archives: 1 > Standards: 8 > > Apache Licensed: 5 > Generated Documents: 1 > > JavaDocs are generated, thus a license header is optional. > Generated files do not require license headers. > > 2 Unknown Licenses > > ***************************************************** > > Files with unapproved licenses: > > src/test/resources/elements/Source.java > src/test/resources/elements/sub/Empty.txt > > ***************************************************** > > ***************************************************** > Documents with unapproved licenses will start with a '!' > The next character identifies the document type. > > char type > a Archive file > b Binary file > g Generated file > n Notice file > s Standard file > u Unknown file. > > s src/test/resources/elements/ILoggerFactory.java > MIT The MIT License > b src/test/resources/elements/Image.png > n src/test/resources/elements/LICENSE > n src/test/resources/elements/NOTICE > !s src/test/resources/elements/Source.java > ????? Unknown license > s src/test/resources/elements/Text.txt > AL Apache License Version 2.0 > s src/test/resources/elements/TextHttps.txt > AL Apache License Version 2.0 > s src/test/resources/elements/Xml.xml > AL Apache License Version 2.0 > s src/test/resources/elements/buildr.rb > AL Apache License Version 2.0 > a src/test/resources/elements/dummy.jar > g src/test/resources/elements/generated.txt > GEN Generated Files > b src/test/resources/elements/plain.json > s src/test/resources/elements/tri.txt > AL Apache License Version 2.0 > BSD-3 BSD 3 clause > TMF The Telemanagement Forum License > !s src/test/resources/elements/sub/Empty.txt > ????? Unknown license > > ***************************************************** > > I think this solves the problem. > > Claude > > On Thu, Mar 28, 2024 at 10:17 AM Claude Warren <cla...@xenei.com> wrote: > >> SPDX[1] has an interesting format where they can report 2 (or more?) >> licenses in one. >> >> There are a couple of things here that we will need to look at: >> >> 1. Metadata only stores one matching license. >> 2. Can we modify the output XML to list multiple licenses for a file >> without too much trouble. I don't think the existing XLST will >> have problems with it. >> 3. SPDX [1] has an interesting format where they can report 2 (or >> more?) licenses in one. Perhaps we should use their format for license >> identification. This would allow us to report the SPDX tags that >> reference >> multiple licenses. >> >> Also, everytime I look at the LicenseFamily code I wonder why there is a >> limit of 5 on the number of characters in the license family category. It >> feels like a formatting issue was pushed into the internal code. Drives me >> crazy. >> >> [1] https://spdx.dev/learn/handling-license-info/ >> >> On Thu, Mar 28, 2024 at 10:01 AM P. Ottlinger <pottlin...@apache.org> >> wrote: >> >>> Hi, >>> >>> Am 28.03.24 um 09:41 schrieb Claude Warren: >>> > I got back to looking at 366 and discovered a problem that I think has >>> been >>> > lurking in the system for some time. Basically, if a file has the >>> > signatures for more than one license only one will be reported, and the >>> > selection of which one is (I think) random. >>> >>> thanks for analyzing this issue, which explains some random test >>> failuress ..... :( >>> >>> <snip> >>> >>> > My suggestion is we report all license matches and let the user decide >>> what >>> > to do. >>> >>> I'm in favour of reporting as many licenses as possible, but assume this >>> will break the current report format, that is optimized for one license >>> only. >>> >>> Not sure if downstream users have problems with that change?! >>> >>> Would we have a maximum license number or could this result in an >>> "endless" list of reported licenses, if a file with "all" thinkable >>> license files is provided to RAT? Initially I thought of adding a new >>> analyzer/reporting state "MULTIPLE" that is reported in the scan and a >>> detailed report that lists up to x (maybe 3 or 5?) maximum licenses per >>> file - WDYT? >>> >>> > >>> > My plan is to create a branch that reports multiple matching licenses >>> and >>> > then merge that into RAT-366 to resolve the problem. This should give >>> us >>> > all a chance to review the change before it gets added to the already >>> large >>> > RAT-366. >>> >>> +1 >>> >>> Thanks for your deep dive into RAT! >>> >>> Cheers, >>> Phil >>> >> >> >> -- >> LinkedIn: http://www.linkedin.com/in/claudewarren >> > > > -- > LinkedIn: http://www.linkedin.com/in/claudewarren > -- LinkedIn: http://www.linkedin.com/in/claudewarren