Hi, as discussed during last Akademy's KF6 BoF and the KF6 sprint, as well as
on the KDE community list, SPDX license information can help us a lot in
maintaining a (automatically checkable) good license quality throughout our
libraries. Since today, SPDX markers are also known in the KDE licensing
policy and I want to start porting the existing license headers to the new
language.
For anybody without SPDX marker experience, there are now the following good
starting pages:
- KDE Licensing Policy: https://community.kde.org/Policies/Licensing_Policy
- (a very new) Licensing HowTo: https://community.kde.org/
Guidelines_and_HOWTOs/Licensing
- REUSE.software: http://reuse.software
The question now is, how to introduce the SPDX markers in a reasonable way.
Since some time, I worked on a conversion tooling, which works quite well
right now (it converts about 98% of the headers and leaves the remaining ones
for manual work) [1]. The approach I followed was:
1. For every license header in KF5 (about 130 versions of statements), I
created a plain text file that contains that license statement.
2. All license header files with the same meaning (e.g. all LGPL-2.0-or-later
headers) are combined in a regular expression that also matches all possible
whitespaces, linebreaks and "*" characters.
3. For every individual license header there is a reference original source
code file included, which is used inside a unit test to verify that the license
is correctly detected.
4. My tool has a "--convert" option that replaces all matched regular
expressions (only if they could be detected unambiguously!) to the
corresponding SPDX expression and adds the respective license files to the root
folder of the project.
In the KF5 repositories there are slightly more than 9000 files with copyright
information that I want to convert. I plan to provide a patch for every single
(non porting-aid) KF5 repository, starting with the Tier 1 repositories. Each
patch will contain the changes created by my licensedigger tool, possibly a
few style changes (meaning whitespace removal or removal of "*" characters).
Any license that I had to state by hand will be in a different commit and
explicitly stated in the pull request.
Does this approach sound reasonable? If anybody wants to review my conversion
tool and the license-header-to-SPDX-translations, I am happy for feedback!
As a first test-balloon, I created a patch for KIdletime [2], mostly because it
is one of the repositories that nearly never sees a change and allows to keep
a pull request open for a longer time, if discussions are needed.
Cheers,
Andreas
[1] https://cgit.kde.org/scratch/cordlandwehr/licensedigger.git/
[2] https://phabricator.kde.org/D26931