Bug#951726: licensecheck: option --encoding is not propagated during recursive scan

2020-02-21 Thread Dominique Dumont
On Thursday, 20 February 2020 21:29:40 CET you wrote:
> Thanks for an excellently framed bugreport!

You're most welcome :)

> (enabling --verbose also reveals that Licensecheck wrongly treats Encode
> objects as strings, as seen with the HASH string in the warning message)

Which may explain why licensecheck cannot read moar/05-decoder.t which is an 
utf8 file with some Cyrillic characters.

All the best



Bug#951726: licensecheck: option --encoding is not propagated during recursive scan

2020-02-20 Thread Jonas Smedegaard
Control: tag -1 confirmed

Hi Dominique,

Quoting Dominique Dumont (2020-02-20 17:15:29)
> While packaging nqp, I've noticed a discrepancy in licensecheck output:
> 
> licensecheck correctly reports the absence of information when
> scanning nqp/115-nums.t file from nqp directory:
> 
> $ licensecheck --encoding utf8 --copyright --machine --recursive nqp | grep 
> 115 
> nqp/115-nums.t  UNKNOWN *No copyright*
> 
> licensecheck correctly reports garbage when scanning nqp/115-nums.t
> file from current directory:
> 
> $ licensecheck --encoding utf8 --copyright --machine --recursive . | grep 115
> ./nqp/115-nums.tUNKNOWN ೨೪, ೫e-೩೨೪, '6e-324 denormal 
> equates to 5e-324 denormal (Uni)'); / ೨೪, ೫e-೩೨೪, '5e-324 
> denormal equates to 5e-324 denormal (Uni)'); / ೨೪, ೧e-೩೨೩, 
> '9e-324 denormal is 1e-323 (Uni)'); / ೨೪, ೦e೦, 'denormal 5e-324 is 
> recognized and is not 0 (Uni)'); / ೨೪, ೦e೦, '2e-324 denormal is 0e0 
> (Uni)'); / e-೩೨೪, ೫e-೩೨೪, '2e-324 denormal equates to 5e-324 
> denormal (Uni)');
> 
> The mis-decoded file contains © character hence the mojibake garbage.
> 
> I would expect --encoding utf8 option to be used to read all files.

Thanks for an excellently framed bugreport!

The cause for the difference in output is revealed in --verbose mode:

$ licensecheck --encoding utf8 --copyright --machine --recursive --verbose . | 
grep '115\|cannot be read'
file moar/05-decoder.t cannot be read with 
App::Licensecheck=HASH(0x563009ec7500)->encoding; encoding, will try latin-1:
- nqp/115-nums.t header -
./nqp/115-nums.tUNKNOWN ೨೪, ೫e-೩೨೪, '6e-324 denormal 
equates to 5e-324 denormal (Uni)'); / ೨೪, ೫e-೩೨೪, '5e-324 denormal 
equates to 5e-324 denormal (Uni)'); / ೨೪, ೧e-೩೨೩, '9e-324 denormal 
is 1e-323 (Uni)'); / ೨೪, ೦e೦, 'denormal 5e-324 is recognized and is not 
0 (Uni)'); / ೨೪, ೦e೦, '2e-324 denormal is 0e0 (Uni)'); / e-೩೨೪, 
೫e-೩೨೪, '2e-324 denormal equates to 5e-324 denormal (Uni)');

Licensecheck chokes on moar/05-decoder.t and re-reads as latin-1.

...but then licensecheck _continues_ to read following files as latin-1, 
which is wrong.

(enabling --verbose also reveals that Licensecheck wrongly treats Encode 
objects as strings, as seen with the HASH string in the warning message)


 - Jonas

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private

signature.asc
Description: signature


Bug#951726: licensecheck: option --encoding is not propagated during recursive scan

2020-02-20 Thread Dominique Dumont
Package: licensecheck
Version: 3.0.44-1
Severity: normal

Dear Maintainer,

While packaging nqp, I've noticed a discrepancy in licensecheck output:

licensecheck correctly reports the absence of information when
scanning nqp/115-nums.t file from nqp directory:

$ licensecheck --encoding utf8 --copyright --machine --recursive nqp | grep 115 
nqp/115-nums.t  UNKNOWN *No copyright*

licensecheck correctly reports garbage when scanning nqp/115-nums.t
file from current directory:

$ licensecheck --encoding utf8 --copyright --machine --recursive . | grep 115
./nqp/115-nums.tUNKNOWN ೨೪, ೫e-೩೨೪, '6e-324 denormal 
equates to 5e-324 denormal (Uni)'); / ೨೪, ೫e-೩೨೪, '5e-324 denormal 
equates to 5e-324 denormal (Uni)'); / ೨೪, ೧e-೩೨೩, '9e-324 denormal 
is 1e-323 (Uni)'); / ೨೪, ೦e೦, 'denormal 5e-324 is recognized and is not 
0 (Uni)'); / ೨೪, ೦e೦, '2e-324 denormal is 0e0 (Uni)'); / e-೩೨೪, 
೫e-೩೨೪, '2e-324 denormal equates to 5e-324 denormal (Uni)');

The mis-decoded file contains © character hence the mojibake garbage.

I would expect --encoding utf8 option to be used to read all files.

To reproduce:
$ git clone https://salsa.debian.org/perl6-team/nqp.git
$ git checkout 50fb547df36cc51df65c12503f4db223db39361d # optional
$ cd t/

and then run the commands above.

All the best



Dod


-- System Information:
Debian Release: bullseye/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 
'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.4.0-4-amd64 (SMP w/8 CPU cores)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_WARN, TAINT_OOT_MODULE, 
TAINT_UNSIGNED_MODULE
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_US:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages licensecheck depends on:
ii  libarray-intspan-perl  2.003-1
ii  libgetopt-long-descriptive-perl0.104-1
ii  liblist-someutils-perl 0.58-1
ii  liblog-any-adapter-screen-perl 0.140-1
ii  liblog-any-perl1.708-1
ii  libmoo-perl2.003006-1
ii  libmoox-struct-perl0.017-1
ii  libnamespace-clean-perl0.27-1
ii  libpath-iterator-rule-perl 1.014-1
ii  libpath-tiny-perl  0.108-1
ii  libpod-constants-perl  0.19-1
ii  libre-engine-re2-perl  0.13-4+b1
ii  libregexp-pattern-license-perl 3.1.102-1
ii  libregexp-pattern-perl 0.2.12-1
ii  libscalar-list-utils-perl  1:1.54-1
ii  libsort-key-perl   1.33-2+b2
ii  libstrictures-perl 2.06-1
ii  libstring-copyright-perl   0.003006-1
ii  libstring-escape-perl  2010.002-2
ii  libtry-tiny-perl   0.30-1
ii  perl   5.30.0-9
ii  perl-base [libscalar-list-utils-perl]  5.30.0-9

licensecheck recommends no packages.

Versions of packages licensecheck suggests:
ii  bash-completion  1:2.10-1

-- no debconf information