Sun Aug 02 20:04:00 2015: Request 106142 was acted upon. Transaction: Correspondence added by SLAFFAN Queue: Module-ScanDeps Subject: [Patch] Preload dependencies for PDL and PDL::NiceSlice Broken in: (no value) Severity: (no value) Owner: Nobody Requestors: slaf...@cpan.org Status: open Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=106142 >
On Sun Aug 02 18:02:33 2015, RSCHUPP wrote: > On 2015-08-02 00:48:42, SLAFFAN wrote: > > Instrumenting _glob_in_inc to print to stdout whenever unico[rd]e is > > passed as the subdir argument has no effect, so I assume the utf8.pm > > preload sub is not being run for the above preload rules. > > Thanks for investigating. I tried to figure out at what point > utf8_heavy.pl > comes into play. For that I prepended this to your sample script > > BEGIN > { > # insert spy CODE into require's module lookup > unshift @INC, sub > { > my ($self, $pm) = @_; > print STDERR "# require $pm\n"; > ($package, $filename, $line) = caller; > print STDERR "# from $package ($filename:$line)\n"; > return; # i.e. take a pass > }; > } > > This intercepts any (explicit or implicit) "require", prints out what > is required > and from where and then resumes "normal" processing. Here's the output > > # require PDL.pm > # from main (/home/roderich/todo/PAR/Module-ScanDeps/shawn.pl:15) > # require PDL/Core.pm > # from main ((eval 1):6) > # require PDL/Types.pm > # from PDL::Core (/usr/lib/x86_64-linux- > gnu/perl5/5.22/PDL/Core.pm:223) > # require Carp.pm > # from PDL::Types (/usr/lib/x86_64-linux- > gnu/perl5/5.22/PDL/Types.pm:6) > # require strict.pm > # from Carp (/usr/share/perl/5.22/Carp.pm:4) > # require warnings.pm > # from Carp (/usr/share/perl/5.22/Carp.pm:5) > # require Exporter.pm > # from Carp (/usr/share/perl/5.22/Carp.pm:99) > # require overload.pm > # from PDL::Type (/usr/lib/x86_64-linux- > gnu/perl5/5.22/PDL/Types.pm:428) > # require overloading.pm > # from overload (/usr/share/perl/5.22/overload.pm:83) > # require warnings/register.pm > # from overload (/usr/share/perl/5.22/overload.pm:144) > # require Exporter/Heavy.pm > # from Exporter (/usr/share/perl/5.22/Exporter.pm:16) > # require PDL/Exporter.pm > # from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):314) > # require DynaLoader.pm > # from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):315) > # require Config.pm > # from DynaLoader (/usr/lib/x86_64-linux- > gnu/perl/5.22/DynaLoader.pm:21) > # require vars.pm > # from Config (/usr/lib/x86_64-linux-gnu/perl/5.22/Config.pm:11) > # require Scalar/Util.pm > # from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):1000) > # require List/Util.pm > # from Scalar::Util (/usr/lib/x86_64-linux- > gnu/perl/5.22/Scalar/Util.pm:11) > # require XSLoader.pm > # from List::Util (/usr/lib/x86_64-linux- > gnu/perl/5.22/List/Util.pm:21) > # require utf8.pm > # from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):1028) > # require utf8_heavy.pl > # from utf8 (/usr/share/perl/5.22/utf8.pm:16) > # require re.pm > # from utf8 (/usr/share/perl/5.22/utf8_heavy.pl:4) > # require unicore/Heavy.pl > # from utf8 (/usr/share/perl/5.22/utf8_heavy.pl:185) > # require unicore/lib/Alpha/Y.pl > # require PDL/Options.pm > # from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):3288) > # require Fcntl.pm > # from PDL::Core (Basic/Core/Core.pm.PL (i.e. PDL::Core.pm):4167) > ... > > utf8.pm and the utf8_heavy.pl are actually loaded from PDL::Core.pm > The funny "Basic/Core/Core.pm.PL (i.e. PDL::Core.pm)" is caused by the > fact > that PDL/Core.pm is a generated file with some > > # line 123 "Basic/Core/Core.pm.PL (i.e. PDL::Core.pm)" > > lines in it. And the offending line is > > if $value =~ /e\p{IsAlpha}/ or $value =~ /\p{IsAlpha}e/; > > There's no explicit mention of utf8.pm here - the code uses a Unicode > property > in a regular expression. utf8.pm (at least in Perl 5.22) doesn't do > anything > except setting up a AUTOLOAD sub that will require utf8_heavy.pl when > being run. > (If you check $utf8::AUTOLOAD when our @INC spy is called, it's value > is "utf8::SWASHNEW".) > > So the whole utf8_heavy.pl + unico[dr]e shebang is triggered on demand > whenever > some Unicode feature of Perl is requested, e.g. a Unicode property in > a regex, > probably lots of others. > > I don't think it's feasible to try to detect this by statical > analysis. > Should we just add this stuff (at least 4 MB speread over more than > 400 files) > to _every_ packed executable? > > Cheers, Roderich Thanks Roderich, The size issue rears its head once more... It would also be a Herculean task to get static scanning to detect all such cases (although maybe PPI could be leveraged if someone ever has the tuits - https://metacpan.org/pod/PPI::Token::Regexp ). Perhaps another flag could be added to pp for the cases where the code does not explicitly call for unicode, but it is needed for a packed executable to work. pp --unicode? I also now think that this is the root cause of an issue I've been working around for a while using the code below. I use the pp -x flag when building, and set an environment variable in my script before calling pp. if ($ENV{BDV_PP_BUILDING}) { use 5.016; use feature 'unicode_strings'; my $string = "sp_self_only() and \N{WHITE SMILING FACE}"; $string =~ /\bsp_self_only\b/; } Given that, it should be possible to statically scan for the various permutations of /use feature 'unicode_/ to detect unicode_strings and unicode_eval. If someone is using those features in their code then they need the extra libraries. https://metacpan.org/pod/feature#The-unicode_strings-feature Such scanning would not detect multiline chunks, as per the documentation caveats. A "pp -unicode" style flag would still be needed in such cases. https://metacpan.org/pod/Module::ScanDeps#CAVEATS WRT the pp flag, maybe a more general approach would be something that parallels the feature pragma, e.g. pp --feature=unicode_strings,unicode_eval pp --feature=":5.12" Regards, Shawn.