Thanks for the additional feedback and suggestions. Responses inline below.
Gary > -----Original Message----- > From: Jason van Zyl [mailto:[email protected]] > Sent: Monday, January 27, 2014 1:14 PM > To: Maven Developers List > Subject: Re: SPDX Maven Plugin > > > On Jan 27, 2014, at 3:35 PM, Gary O'Neall <[email protected]> > wrote: > > > Hi Jason, > > > > Thanks for the response and suggestions. > > > > Currently, the plugin is only taking information from the POM and > > formatting it as an SPDX file. > > > > With the plugin, I am primarily targeting developers and organization > > which originate the code. In this case, the license information in > > the POM is > > (hopefully) accurate. > > I think even in this case you will run into problems because it's > usually never intentional to mis-state the licenses being used but > unless the output is the result of analysis (machine/human/both) it's > not going to be very accurate. You also have the transitivity issue as > well. If you're not trying to account for this then it's also not very > useful IMO as that's most of the problem with Maven. > I agree in the case where a project is including source from other projects or origins. In that case, the code should be analyzed and the license information verified. For the case where the source code originates with the organization or person, there is only the issue of inconsistencies created by that organization or person between what is stated in the POM and any license headers they included in the source. Although I do run into this on occasion, it is not as much of an issue as projects which include source from other projects. To address this issue, we could include the results of a previous analysis (either generated by a plugin, independent tool or a human). I'll add that as a feature to a plugin. I'm thinking of allowing an existing SPDX file to be used to specify the results of the analysis (most tools support this format today). The plugin could then "fill in the blanks", generate the checksums, report any inconsistencies and update the SPDX file. In the case of original code, it would just use the parameters in the POM file. A similar approach should take care of transitive issues. I'll also forward your feedback to the SPDX community and see if there are some other ideas to address this concern. > > This does depend on the developer or organization updating the > license > > information in the POM file. From my experience, this is not always > > done - but in the case where the license information is included, the > > SPDX file would be generated saving the developer the effort of > > creating the SPDX file with an external tool (or worse - manually). > > > > I do have a TODO to scan the source files for specific SPDX license > > identifiers. There is an SPDX effort underway to standardize a > format > > for license information which could be put in the comments within the > > source files. > > > > I agree with your issue that it could be misleading if we state > > licenses that are not validated. To address this, the default > license > > is "NOASSERTION" if the license is used if the license is not not > > specifically stated in the POM file for the project or the files. If > > the POM file contains an actual license reference, that is used. > There > > are configuration parameters to control this behavior, but the > > defaults are set conservatively to address this issue. > > > > I'm familiar with JNinka and it would be a good parser to use, but it > > has some licensing issues (licensed under AGPL which would make it > > incompatible with some of the Apache projects). > > It's just a tool, and a plugin that integrated it can be used in a > build without issue. It just generates the data. Easy enough to take > the code, put it on AWS and feed it source JARs to get the results. > Would be nice to try and and use all available scanners and then you > can cross reference the results. > In thinking about it some more, I believe it would be dangerous to have a scanning tool build the SPDX file as the authoritative SPDX file for a project. They just have too many "false positives". I've worked with a lot of the tools (including Ninka) and they all have this issue to some degree (even the Source Auditor tools that my company uses). A safer approach would be to have the scanning tool provide an "audit" of an existing SPDX and report inconsistencies. A human could then review these results and update the SPDX file appropriately. Once I finish with the spdx generating plugin, I can take a look at wrapping one of the existing scanning tools which produce SPDX to produce such an audit tool. BTW - there are a couple of existing websites that do this scanning today: - https://sites.google.com/site/fossologyunospdx/ and https://fossologyspdx.ist.unomaha.edu/ - http://spdx.windriver.com/pkg_upload_Update.aspx Once I add the feature to import the SPDX files, you can run the tool then use it in the SPDX maven plugin which would keep the checksums up to date. > > Like all code scanners, it is not 100% accurate which could lead to > > some misleading results in some cases. I find that you always need a > > human to validate the results of the source code scanners. That > being > > said, somehow integrating code scanning into the plugin would > > definitely save time for those cases where the POM file is not > > generated by the originator of the code. Perhaps there could be a > > JNinka maven plugin which creates some data that the SPDX plugin > > consumes to create the SPDX file. Something worth exploring. Let me > > know if you have any thoughts on how this may be structured in Maven. > > > > One note on your plugin is that I would separate all the non-Maven > logic and make a simple library JAR you can use instead of. Make the > Mojo the smallest of wrappers around your base code. That way you can > use the code from Ant/Gradle or other build tools as well if that's > your target. > Excellent suggestion. I'll restructure per your suggestion. Thanks! > > Gary > > > >> -----Original Message----- > >> From: Jason van Zyl [mailto:[email protected]] > >> Sent: Monday, January 27, 2014 11:06 AM > >> To: Maven Developers List > >> Subject: Re: SPDX Maven Plugin > >> > >> Gary, > >> > >> Does the plugin do any source level scanning to determine licenses, > >> or are you just taking the information from the POM? If you're using > >> something like JNinka[1] to figure out what licenses are actually > >> present then that's a first step. Actually determining if it's a > >> valid set of licenses is another step. But if it's just taking > >> information from the POM I'm not sure that's useful and may > >> potentially be more harmful as people might think "Oh, it has an > SPDX > >> descriptor so it must legally accurate." > >> > >> [1]: https://github.com/whitesource/jninka > >> > >> On Jan 20, 2014, at 12:20 PM, Gary O'Neall <[email protected]> > >> wrote: > >> > >>> Greetings all, > >>> > >>> I am somewhat new to Maven plugin development and would like to ask > >>> the developer community for some feedback and help in developing a > >>> plugin to generate project license metadata compliant with the > >>> Software Product Data Exchange (SPDX) standard. > >>> > >>> The SPDX specification is a standard format for communicating the > >>> components, licenses and copyrights associated with a software > >>> package. We are on version 1.2 of the spec and is in use at > several > >>> of the SPDX participating companies (see www.spdx.org for more > info). > >>> > >>> The motivation for the plugin was the result of a discussion > between > >>> Phil Odence and myself (from SPDX) and Jim Jagielski and Henri > >> Yandell > >>> (from > >>> Apache) on ideas for how Apache projects could produce or utilize > >>> SPDX. It was suggested that a maven plugin would substantially > >> reduce > >>> the effort for several Apache projects. > >>> > >>> Over the past couple weeks, I have studied the Maven Mojo and > plugin > >>> API's and produced a prototype which will generate an SPDX file > >>> based on a POM file. You can find the code on Github at > >>> https://github.com/goneall/spdx-maven-plugin > >>> > >>> Here's my questions for the Maven Developers: > >>> - Is anyone on the list interested and have some time to > collaborate > >>> on the plugin? I'm pretty comfortable on the Java and SPDX output > >>> coding, but I'm new to Maven and could use some experienced review > >>> of some of my choices regarding Maven parameters and > implementation. > >>> A review of the code would be most appreciated. I could also post > >>> the more specific questions to this list if that is appropriate. > >>> > >>> - Are there any related efforts I should be aware of? (Note: I did > >>> find another spdx maven plugin on github. I reached out to the > >> author > >>> and have not heard back, so I'm not sure how active the project > is). > >>> > >>> - Once the plugin is built and unit tested, what is the best way to > >>> make it more accessible to other developers? > >>> > >>> - A spreadsheet mapping the SPDX properties to either > >>> spdx-maven-prototype configuration parameters or existing Maven > >>> model properties can be downloaded at > >>> https://github.com/goneall/spdx-maven-plugin/blob/master/SPDX- > fields > >>> - > >> m > >>> aven-m apping.xlsx. Feedback on the prototype choices are welcome. > >>> There is also a proposed longer term mapping, some of which extends > >>> the current Maven model. > >>> > >>> Thanks in advance, > >>> Gary > >>> > >>> > >>> ------------------------------------------------- > >>> Gary O'Neall > >>> Principal Consultant > >>> Source Auditor Inc. > >>> > >>> > >>> ------------------------------------------------------------------- > - > >>> - To unsubscribe, e-mail: [email protected] For > >>> additional commands, e-mail: [email protected] > >>> > >> > >> Thanks, > >> > >> Jason > >> > >> ---------------------------------------------------------- > >> Jason van Zyl > >> Founder, Apache Maven > >> http://twitter.com/jvanzyl > >> http://twitter.com/takari_io > >> --------------------------------------------------------- > >> > >> To think is easy. To act is hard. But the hardest thing in the world > >> is to act in accordance with your thinking. > >> > >> -- Johann von Goethe > >> > >> > >> > >> > >> > >> > >> > >> > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] For > > additional commands, e-mail: [email protected] > > > > Thanks, > > Jason > > ---------------------------------------------------------- > Jason van Zyl > Founder, Apache Maven > http://twitter.com/jvanzyl > http://twitter.com/takari_io > --------------------------------------------------------- > > A man enjoys his work when he understands the whole and when he is > responsible for the quality of the whole > > -- Christopher Alexander, A Pattern Language > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
