I changed the standard report slightly.  It now looks like:

 s src/test/resources/elements/ILoggerFactory.java
    MIT      MIT           The MIT License

 b src/test/resources/elements/Image.png

 n src/test/resources/elements/LICENSE

 n src/test/resources/elements/NOTICE

!s src/test/resources/elements/Source.java
    ?????    ?????         Unknown license

 s src/test/resources/elements/Text.txt
    AL       AL            Apache License Version 2.0

 s src/test/resources/elements/TextHttps.txt
    AL       AL            Apache License Version 2.0

 s src/test/resources/elements/Xml.xml
    AL       AL            Apache License Version 2.0

 s src/test/resources/elements/buildr.rb
    AL       AL            Apache License Version 2.0

 a src/test/resources/elements/dummy.jar

 g src/test/resources/elements/generated.txt
    GEN      GEN           Generated Files

 b src/test/resources/elements/plain.json

 s src/test/resources/elements/tri.txt
    AL       AL            Apache License Version 2.0
    BSD-3    BSD-3         BSD 3 clause
    BSD-3    TMF           The Telemanagement Forum License

!s src/test/resources/elements/sub/Empty.txt
    ?????    ?????         Unknown license


*****************************************************

On Fri, Mar 29, 2024 at 3:50 PM Claude Warren <cla...@xenei.com> wrote:

> I have a proposed change.  See
> https://github.com/Claudenw/creadur-rat/pull/6/files
> Note that this pull request is the difference between multiple targets and
> the change to move to RAT-366 (Move to single matche call)
>
> Example output in
> https://github.com/Claudenw/creadur-rat/tree/Multiple_license_report/apache-rat/src/site/examples
>
> I reworked the MetaData class and removed all the funky naming.  All we
> really needed to capture for a document is what licenses matched and which
> of those are approved licenses.
>
> The new rat report (in examples) has a "resource" element for each file
> that was checked.  The resource still has a name attribute and I added a
> type attribute that specifies the type of file that it is (e.g. archive,
> standard, binary).  It has two possible child elements "license" and
> "sample"
>
> The license element has several attributes: approval, family, id, and name
> A license can have a notes child element that contains the notes for the
> license.  These are not usually displayed but are included for the
> generated files license.
>
> The sample element contains text from the license.  It is only included
> when the license type is unknown.
>
> The sample and notes text is enclosed in a CDATA block.
>
> I reworked the standard report.  This is probably a breaking change for
> anyone who is parsing the text, but then they should be using a custom xslt
> to extract the info they want.
>
> The new report looks like:
>
>
> *****************************************************
> Summary
> -------
> Generated at: 2024-03-29T15:01:24+01:00
>
> Notes: 2
> Binaries: 2
> Archives: 1
> Standards: 8
>
> Apache Licensed: 5
> Generated Documents: 1
>
> JavaDocs are generated, thus a license header is optional.
> Generated files do not require license headers.
>
> 2 Unknown Licenses
>
> *****************************************************
>
> Files with unapproved licenses:
>
>   src/test/resources/elements/Source.java
>   src/test/resources/elements/sub/Empty.txt
>
> *****************************************************
>
> *****************************************************
>   Documents with unapproved licenses will start with a '!'
>   The next character identifies the document type.
>
>    char         type
>     a       Archive file
>     b       Binary file
>     g       Generated file
>     n       Notice file
>     s       Standard file
>     u       Unknown file.
>
>  s src/test/resources/elements/ILoggerFactory.java
>     MIT   The MIT License
>  b src/test/resources/elements/Image.png
>  n src/test/resources/elements/LICENSE
>  n src/test/resources/elements/NOTICE
> !s src/test/resources/elements/Source.java
>     ????? Unknown license
>  s src/test/resources/elements/Text.txt
>     AL    Apache License Version 2.0
>  s src/test/resources/elements/TextHttps.txt
>     AL    Apache License Version 2.0
>  s src/test/resources/elements/Xml.xml
>     AL    Apache License Version 2.0
>  s src/test/resources/elements/buildr.rb
>     AL    Apache License Version 2.0
>  a src/test/resources/elements/dummy.jar
>  g src/test/resources/elements/generated.txt
>     GEN   Generated Files
>  b src/test/resources/elements/plain.json
>  s src/test/resources/elements/tri.txt
>     AL    Apache License Version 2.0
>     BSD-3 BSD 3 clause
>     TMF   The Telemanagement Forum License
> !s src/test/resources/elements/sub/Empty.txt
>     ????? Unknown license
>
> *****************************************************
>
> I think this solves the problem.
>
> Claude
>
> On Thu, Mar 28, 2024 at 10:17 AM Claude Warren <cla...@xenei.com> wrote:
>
>> SPDX[1] has an interesting format where they can report 2 (or more?)
>> licenses in one.
>>
>> There are a couple of things here that we will need to look at:
>>
>>    1. Metadata only stores one matching license.
>>    2. Can we modify the output XML to list multiple licenses for a file
>>    without too much trouble.  I don't think the existing XLST will
>>    have problems with it.
>>    3. SPDX [1] has an interesting format where they can report 2 (or
>>    more?) licenses in one.  Perhaps we should use their format for license
>>    identification.  This would allow us to report the SPDX tags that 
>> reference
>>    multiple licenses.
>>
>> Also, everytime I look at the LicenseFamily code I wonder why there is a
>> limit of 5 on the number of characters in the license family category.  It
>> feels like a formatting issue was pushed into the internal code.  Drives me
>> crazy.
>>
>> [1] https://spdx.dev/learn/handling-license-info/
>>
>> On Thu, Mar 28, 2024 at 10:01 AM P. Ottlinger <pottlin...@apache.org>
>> wrote:
>>
>>> Hi,
>>>
>>> Am 28.03.24 um 09:41 schrieb Claude Warren:
>>> > I got back to looking at 366 and discovered a problem that I think has
>>> been
>>> > lurking in the system for some time.  Basically, if a file has the
>>> > signatures for more than one license only one will be reported, and the
>>> > selection of which one is (I think) random.
>>>
>>> thanks for analyzing this issue, which explains some random test
>>> failuress ..... :(
>>>
>>> <snip>
>>>
>>> > My suggestion is we report all license matches and let the user decide
>>> what
>>> > to do.
>>>
>>> I'm in favour of reporting as many licenses as possible, but assume this
>>> will break the current report format, that is optimized for one license
>>> only.
>>>
>>> Not sure if downstream users have problems with that change?!
>>>
>>> Would we have a maximum license number or could this result in an
>>> "endless" list of reported licenses, if a file with "all" thinkable
>>> license files is provided to RAT? Initially I thought of adding a new
>>> analyzer/reporting state "MULTIPLE" that is reported in the scan and a
>>> detailed report that lists up to x (maybe 3 or 5?) maximum licenses per
>>> file - WDYT?
>>>
>>> >
>>> > My plan is to create a branch that reports multiple matching licenses
>>> and
>>> > then merge that into RAT-366 to resolve the problem.  This should give
>>> us
>>> > all a chance to review the change before it gets added to the already
>>> large
>>> > RAT-366.
>>>
>>> +1
>>>
>>> Thanks for your deep dive into RAT!
>>>
>>> Cheers,
>>> Phil
>>>
>>
>>
>> --
>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>
>
>
> --
> LinkedIn: http://www.linkedin.com/in/claudewarren
>


-- 
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to