Jonas Smedegaard:
> [..]
> 
> Have a look (if interested) at /usr/share/perl5/String/Copyright.pm and 
> in particular the (huge when expanded) $signs_and_more_re at line 138.
> 
> [..]

Thanks for the tips! I'm not sure if you got my other follow-ups to the bug 
report - I did in fact find String::Copyright, but I didn't know about the 
history nor plans for it, so thanks for filling me in on that.

At any rate, here is an updated version of my patch, along with some test cases 
for Sage's copyright notices.

I did try to think of a way to achieve the same logic *inside* the massive $re 
regexes. However I don't think this is possible, at least with my current 
approach - which tries to be conservative in order to adapt to humans being 
annoyingly inconsistent.

What it does is, it joins subsequent lines only when the indent is greater than 
the main line (with the "Copyright" part). This means I have to call length() 
in an expression-replacement, which I don't think is possible to do inside a 
normal regex...

As for speed:

# with the patch
$ time debian/rules debian/licensecheck.copyright
licensecheck -l250 -i ^sage/build/ -r --deb-machine --merge-licenses sage > 
"debian/licensecheck.copyright"

real    0m35.318s
user    0m35.204s
sys     0m0.056s

# without the patch
$ time debian/rules debian/licensecheck.copyright
licensecheck -l250 -i ^sage/build/ -r --deb-machine --merge-licenses sage > 
"debian/licensecheck.copyright"

real    0m31.168s
user    0m31.040s
sys     0m0.076s

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git
--- /usr/share/perl5/String/Copyright.pm.old	2016-11-30 20:08:44.000000000 +0100
+++ /usr/share/perl5/String/Copyright.pm	2017-07-05 21:02:01.060002642 +0200
@@ -104,7 +104,7 @@
 my $comma_re
 	= qr/$blank_re*,$blank_or_break_re|$blank_or_break_re,?$blank_re*/;
 my $dash_re
-	= qr/$blank_re*[-˗‐‑‒–—―⁃−﹣-]$blank_or_break_re*/;
+	= qr/$blank_re*[-˗‐‑‒–—―⁃−﹣-]+$blank_or_break_re*/;
 my $owner_intro_re   = qr/\bby$blank_or_break_re/;
 my $owner_prefix_re  = qr/[(*<@[{]/;
 my $owner_initial_re = qr/[^\s!\"#$%&'()*+,.\/:;<=>?@[\\\]^_`{|}~]/;
@@ -135,6 +135,8 @@
 my $years_re    = qr/$yearspan_re(?:$comma_re$yearspan_re)*/;
 my $owners_re   = qr/$owner_prefix_re*$owner_initial_re\S*(?:$blank_re*\S+)*/;
 
+my $line_preamble_re
+	= qr/(?:#|\/\/|\/\*)?\s*/;
 my $signs_and_more_re
 	= qr/(?:$chatter_re.*|$signs_re(?::$blank_or_break_re|$comma_re)$broken_sign_re?($years_re?$comma_re?$owner_intro_re?$owners_re?)|(?:\n|\z))/;
 
@@ -155,6 +157,14 @@
 
 		# stringify objects
 		$copyright = "$copyright";
+		# concatenate multi-line notices together
+		my $old_copyright;
+		do {
+			$old_copyright = $copyright;
+			$copyright =~ s/((?:^|\n)$line_preamble_re)($signs_and_more_re,?)\n($line_preamble_re)/
+			(length $4 <= length $1)? "$1$2\n$4":
+			(sub{ shift =~ m{(?:\band|,)$}; })->($2)? "$1$2 ": "$1$2, "/eg;
+		} while ($copyright ne $old_copyright);
 
 		# TODO: also parse @_ - but each separately!
 		my @block;

Attachment: copyright-test.sh
Description: application/shellscript

Reply via email to