I am trying to find a way to get self-contained par apps to approach the
speed of PerlApp binaries on Windows.  I ran a few tests on a sample
application to compare performance on my system.  Here are my results:

 

First, I made a standard perlapp self-contained executable.  Perlapp
unpacks just the shared libraries to a single user-specific directory
($TEMP/pdk-<user>).  It does not remove the files when done.  The files
are named using an md5 checksum so they are unlikely to collide if there
are changes.  The one exception I noticed was perl58.dll which was
placed in a checksum named subdirectory under its original name.
Presumably something similar must be done for any bundled run-time
linker dependency.  The timing figures for this test were:

            First call:           0.6 s 

            Second call:      0.3 s

 

Next I tried making a par binary that unpacks to an application specific
temp directory ($TEMP/par-<user>/cache-<sum>) and does not clean up
afterwards.  In this mode, par unpacks the whole zip file as well as all
of the bootstrap files.  This results in:

            First call:           8.5 s

            Second call:      0.5 s

 

The next test I ran was to create a self contained executable made with
par and -clean.  This case unpacks some of the files each time and
removes the temporary directory when done:

            First call:           2.3 s

            Second call:      2.3 s (no reuse)

 

These tests indicate that there is a severe penalty for unpacking the
whole file, but once that is done the second use is pretty fast.  Taking
a cue from the perlapp approach, I tried running a few additional
experiments.  To get a baseline for the cost of using infozip to expand
the zip file, I tried expanding just the dlls, then everything:

            .dlls only:          0.5 s

            Everything:        4.9 s

 

That's pretty interesting.  We might be able to speed things up a lot by
only unpacking the dlls.  So in my next experiment I modified par a bit
to disable the calls to _extract_inc in PAR.pm so it doesn't unpack
everything when running with a cache directory.  This improved things a
lot:

            First call:           1.7 s

            Second call:      0.5 s

 

This is pretty good, but we're still unpacking a lot of .pm files that
could be loaded directly from memory.  It turns out there are two places
that files are unpacked.  One is the bootstrap files embedded in the
binary, the rest are inside the .par file.  The .par files can be loaded
directly from memory by using PerlIO::scalar to create streams from
buffers.  Some of the bootstrap files must be saved to disk in order to
get far enough along to get the dynloader and PerlIO::scalar modules
loaded, but the rest can be loaded from memory.  The resulting cache
directory only contains a small subset of the .pm files plus all of the
.dlls.  Also, par was extracting files before testing whether they
already exist in the cache.  Deferring the read avoids unnecessary I/O
(it also avoids the permission denied problems that sometimes crop up
when the same binary is called multiple times).  With these changes in
place, I got the following numbers:

            First call:           1.2 s

            Second call:      0.5 s

 

Rerunning the -clean case gives:

            First call:           1.6 s

            Second call:      1.6 s

 

This is a significant improvement over the original cached and uncached
cases and probably approaches the limit of what can be achieved with the
current zip implementation and application structure.  So what's
missing?

 

There are probably ways to streamline the bootstrapping to avoid the
need to unpack any .pm files to disk.  Further tweaking of the startup
code might allow the PerlIO::scalar and dynloader modules to be loaded
explicitly.

 

My current deltas are a bit of a hack and don't preserve the original
_extract_inc behavior.  Ideally, the this would be under the control of
a switch.  I'd like to get feedback from the list about the best way to
integrate this.  Also, if you can think of problems with this approach
that I haven't considered, that would be helpful too.  I've verified
that the patch works on both Linux (AS Perl 5.8.7) and Windows XP (AS
Perl 5.8.6).

 

--Scott

 

 

Index: PAR/lib/PAR.pm

===================================================================

--- PAR/lib/PAR.pm        (revision 542)

+++ PAR/lib/PAR.pm     (working copy)

@@ -367,7 +367,7 @@

         # XXX - handle META.yml here!

         push @PAR_INC, unpar($progname, undef, undef, 1);

 

-        _extract_inc($progname) unless $ENV{PAR_CLEAN};

+#        _extract_inc($progname) unless $ENV{PAR_CLEAN};

 

         my $zip = $LibCache{$progname};

         my $member = _first_member( $zip,

@@ -472,7 +472,7 @@

         PAR::Heavy::_init_dynaloader();

         

         # XXX - handle META.yml here!

-        _extract_inc($opt->{file}) unless $ENV{PAR_CLEAN};

+#        _extract_inc($opt->{file}) unless $ENV{PAR_CLEAN};

         

         my $zip = $LibCache{$opt->{file}};

         my $member = _first_member( $zip,

@@ -865,15 +865,10 @@

 

     return $member if $member_only;

 

-    my ($fh, $is_new);

-    ($fh, $is_new, $LastTempFile) = _tempfile($member->crc32String .
".pm");

-    die "Bad Things Happened..." unless $fh;

+    my ($fh, $contents);

+    $contents = $member->contents();

+    open($fh, "<", \$contents) || die "Bad Things Happened...";

 

-    if ($is_new) {

-        $member->extractToFileHandle($fh);

-        seek ($fh, 0, 0);

-    }

-

     return $fh;

 }

 

Index: PAR-Packer/script/par.pl

===================================================================

--- PAR-Packer/script/par.pl        (revision 542)

+++ PAR-Packer/script/par.pl     (working copy)

@@ -273,38 +273,39 @@

         outs(qq(Unpacking file "$fullname"...));

         my $crc = ( $fullname =~ s|^([a-f\d]{8})/|| ) ? $1 : undef;

         my ($basename, $ext) = ($buf =~ m|(?:.*/)?(.*)(\..*)|);

+        my $isLibrary = (defined($ext) and $ext !~
/\.(?:pm|pl|ix|al)$/i);

+        my $isShlib = ($fullname =~ m|^/?shlib/|);

 

         read _FH, $buf, 4;

-        read _FH, $buf, unpack("N", $buf);

+        my $size = unpack("N", $buf);

 

-        if (defined($ext) and $ext !~ /\.(?:pm|pl|ix|al)$/i) {

+        if ($isLibrary || $isShlib) {

             my ($out, $filename) = _tempfile($ext, $crc);

+            if ($isLibrary) {

+                $PAR::Heavy::FullCache{$fullname} = $filename;

+                $PAR::Heavy::FullCache{$filename} = $fullname;

+            }

             if ($out) {

-                binmode($out);

+                read _FH, $buf, $size;

                 print $out $buf;

                 close $out;

                 chmod 0755, $filename;

+                outs(qq(Unpacked "$fullname" to "$filename"));

+            } else {

+                seek _FH, $size, 1;

+                outs(qq(Skipped "$fullname" as it already exists in
"$filename"));

             }

-            $PAR::Heavy::FullCache{$fullname} = $filename;

-            $PAR::Heavy::FullCache{$filename} = $fullname;

-        }

-        elsif ( $fullname =~ m|^/?shlib/| and defined $ENV{PAR_TEMP} )
{

-            # should be moved to _tempfile()

-            my $filename = "$ENV{PAR_TEMP}/$basename$ext";

-            outs("SHLIB: $filename\n");

-            open my $out, '>', $filename or die $!;

-            binmode($out);

-            print $out $buf;

-            close $out;

-        }

-        else {

+        } else {

+            read _FH, $buf, $size;

             $require_list{$fullname} =

             $PAR::Heavy::ModuleCache{$fullname} = {

                 buf => $buf,

                 crc => $crc,

                 name => $fullname,

             };

+            outs(qq(Unpacked "$fullname" to memory))

         }

+

         read _FH, $buf, 4;

     }

     # }}}

@@ -323,10 +324,19 @@

             delete $require_list{$key} if defined($key);

         } or return;

 

+        my $havePerlIOScalar = defined($INC{"PerlIO/scalar.pm"});

         $INC{$module} = "/loader/$filename/$module";

 

+        if ($havePerlIOScalar && defined($filename->{buf})) {

+            outs("Loading module '$module' from memory");

+            open my $fh, '<', \$filename->{buf};

+            binmode($fh);

+            return $fh;

+        }

+

         if ($ENV{PAR_CLEAN} and defined(&IO::File::new)) {

-            my $fh = IO::File->new_tmpfile or die $!;

+            my ($fh, $name) = IO::File->new_tmpfile or die $!;

+            outs("Loading module '$module' from a temp file");

             binmode($fh);

             print $fh $filename->{buf};

             seek($fh, 0, 0);

@@ -339,6 +349,7 @@

                 print $out $filename->{buf};

                 close $out;

             }

+            outs("Loading module '$module' from '$name'");

             open my $fh, '<', $name or die $!;

             binmode($fh);

             return $fh;

@@ -355,6 +366,8 @@

     require Carp::Heavy;

     require Exporter::Heavy;

     PAR::Heavy::_init_dynaloader();

+    require PerlIO;

+    require PerlIO::scalar;

 

     # now let's try getting helper modules from within

     require IO::File;

@@ -680,8 +693,12 @@

             }x;

             my $extract_name = $1;

             my $dest_name = File::Spec->catfile($ENV{PAR_TEMP},
$extract_name);

-            $member->extractToFileNamed($dest_name);

-            outs(qq(Extracting "$member_name" to "$dest_name"));

+            if (! -f $dest_name) {

+                $member->extractToFileNamed($dest_name);

+                outs(qq(Extracting "$member_name" to "$dest_name"));

+            } else {

+                outs(qq(Skipping "$member_name" since it already exists
at "$dest_name"));

+            }

         }

     }

     # }}}

@@ -735,6 +752,8 @@

     require PAR::Heavy;

     require PAR::Dist;

     require PAR::Filter::PodStrip;

+    require PerlIO;

+    require PerlIO::scalar;

     eval { require Cwd };

     eval { require Win32 };

     eval { require Win32::Process };

Reply via email to