Bug#951726: licensecheck: option --encoding is not propagated during recursive scan
On Thursday, 20 February 2020 21:29:40 CET you wrote: > Thanks for an excellently framed bugreport! You're most welcome :) > (enabling --verbose also reveals that Licensecheck wrongly treats Encode > objects as strings, as seen with the HASH string in the warning message) Which may explain why licensecheck cannot read moar/05-decoder.t which is an utf8 file with some Cyrillic characters. All the best
Bug#951726: licensecheck: option --encoding is not propagated during recursive scan
Control: tag -1 confirmed Hi Dominique, Quoting Dominique Dumont (2020-02-20 17:15:29) > While packaging nqp, I've noticed a discrepancy in licensecheck output: > > licensecheck correctly reports the absence of information when > scanning nqp/115-nums.t file from nqp directory: > > $ licensecheck --encoding utf8 --copyright --machine --recursive nqp | grep > 115 > nqp/115-nums.t UNKNOWN *No copyright* > > licensecheck correctly reports garbage when scanning nqp/115-nums.t > file from current directory: > > $ licensecheck --encoding utf8 --copyright --machine --recursive . | grep 115 > ./nqp/115-nums.tUNKNOWN ೨೪, ೫e-೩೨೪, '6e-324 denormal > equates to 5e-324 denormal (Uni)'); / ೨೪, ೫e-೩೨೪, '5e-324 > denormal equates to 5e-324 denormal (Uni)'); / ೨೪, ೧e-೩೨೩, > '9e-324 denormal is 1e-323 (Uni)'); / ೨೪, ೦e೦, 'denormal 5e-324 is > recognized and is not 0 (Uni)'); / ೨೪, ೦e೦, '2e-324 denormal is 0e0 > (Uni)'); / e-೩೨೪, ೫e-೩೨೪, '2e-324 denormal equates to 5e-324 > denormal (Uni)'); > > The mis-decoded file contains © character hence the mojibake garbage. > > I would expect --encoding utf8 option to be used to read all files. Thanks for an excellently framed bugreport! The cause for the difference in output is revealed in --verbose mode: $ licensecheck --encoding utf8 --copyright --machine --recursive --verbose . | grep '115\|cannot be read' file moar/05-decoder.t cannot be read with App::Licensecheck=HASH(0x563009ec7500)->encoding; encoding, will try latin-1: - nqp/115-nums.t header - ./nqp/115-nums.tUNKNOWN ೨೪, ೫e-೩೨೪, '6e-324 denormal equates to 5e-324 denormal (Uni)'); / ೨೪, ೫e-೩೨೪, '5e-324 denormal equates to 5e-324 denormal (Uni)'); / ೨೪, ೧e-೩೨೩, '9e-324 denormal is 1e-323 (Uni)'); / ೨೪, ೦e೦, 'denormal 5e-324 is recognized and is not 0 (Uni)'); / ೨೪, ೦e೦, '2e-324 denormal is 0e0 (Uni)'); / e-೩೨೪, ೫e-೩೨೪, '2e-324 denormal equates to 5e-324 denormal (Uni)'); Licensecheck chokes on moar/05-decoder.t and re-reads as latin-1. ...but then licensecheck _continues_ to read following files as latin-1, which is wrong. (enabling --verbose also reveals that Licensecheck wrongly treats Encode objects as strings, as seen with the HASH string in the warning message) - Jonas -- * Jonas Smedegaard - idealist & Internet-arkitekt * Tlf.: +45 40843136 Website: http://dr.jones.dk/ [x] quote me freely [ ] ask before reusing [ ] keep private signature.asc Description: signature
Bug#951726: licensecheck: option --encoding is not propagated during recursive scan
Package: licensecheck Version: 3.0.44-1 Severity: normal Dear Maintainer, While packaging nqp, I've noticed a discrepancy in licensecheck output: licensecheck correctly reports the absence of information when scanning nqp/115-nums.t file from nqp directory: $ licensecheck --encoding utf8 --copyright --machine --recursive nqp | grep 115 nqp/115-nums.t UNKNOWN *No copyright* licensecheck correctly reports garbage when scanning nqp/115-nums.t file from current directory: $ licensecheck --encoding utf8 --copyright --machine --recursive . | grep 115 ./nqp/115-nums.tUNKNOWN ೨೪, ೫e-೩೨೪, '6e-324 denormal equates to 5e-324 denormal (Uni)'); / ೨೪, ೫e-೩೨೪, '5e-324 denormal equates to 5e-324 denormal (Uni)'); / ೨೪, ೧e-೩೨೩, '9e-324 denormal is 1e-323 (Uni)'); / ೨೪, ೦e೦, 'denormal 5e-324 is recognized and is not 0 (Uni)'); / ೨೪, ೦e೦, '2e-324 denormal is 0e0 (Uni)'); / e-೩೨೪, ೫e-೩೨೪, '2e-324 denormal equates to 5e-324 denormal (Uni)'); The mis-decoded file contains © character hence the mojibake garbage. I would expect --encoding utf8 option to be used to read all files. To reproduce: $ git clone https://salsa.debian.org/perl6-team/nqp.git $ git checkout 50fb547df36cc51df65c12503f4db223db39361d # optional $ cd t/ and then run the commands above. All the best Dod -- System Information: Debian Release: bullseye/sid APT prefers unstable APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 5.4.0-4-amd64 (SMP w/8 CPU cores) Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_WARN, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages licensecheck depends on: ii libarray-intspan-perl 2.003-1 ii libgetopt-long-descriptive-perl0.104-1 ii liblist-someutils-perl 0.58-1 ii liblog-any-adapter-screen-perl 0.140-1 ii liblog-any-perl1.708-1 ii libmoo-perl2.003006-1 ii libmoox-struct-perl0.017-1 ii libnamespace-clean-perl0.27-1 ii libpath-iterator-rule-perl 1.014-1 ii libpath-tiny-perl 0.108-1 ii libpod-constants-perl 0.19-1 ii libre-engine-re2-perl 0.13-4+b1 ii libregexp-pattern-license-perl 3.1.102-1 ii libregexp-pattern-perl 0.2.12-1 ii libscalar-list-utils-perl 1:1.54-1 ii libsort-key-perl 1.33-2+b2 ii libstrictures-perl 2.06-1 ii libstring-copyright-perl 0.003006-1 ii libstring-escape-perl 2010.002-2 ii libtry-tiny-perl 0.30-1 ii perl 5.30.0-9 ii perl-base [libscalar-list-utils-perl] 5.30.0-9 licensecheck recommends no packages. Versions of packages licensecheck suggests: ii bash-completion 1:2.10-1 -- no debconf information