Hi,

I've come across a bug (which I didn't report, since I fixed it
instead...), which shows up when I grab over a thousand patches with a
single pull.  This cases an error about "too many open files".

The problem is in save_patches in Pull, which calls writePatch once per
patch.  writePatch itself reads the patch file it writes with the
(relatively) new gzReadPatchFileLazily.  Unfortuntely, the memory-efficient
gzReadPatchFileLazily code turns not to be file-descriptor-efficient, as it
opens and leaves open a Handle for each patch written.  :(

I've implemented two fixes for this problem.  I suspect that either alone
would "fix" the problem... I know that the unsafeInterleaveIO fix by itself
is sufficient.  And the other fix (which is helpful even with the
unsafeInterleaveIO fix) may not cover other similar bugs that may exist in
other commands, so I'd prefer to use both fixes, always provided noone can
see a problem with either fix.

The first patch avoids the file-descriptor leak by using an
unsafeInterleaveIO call to avoid opening the file until we actually want to
read from it.  This is potentially unsafe if the file is modified before
that time, but it isn't *very* unsafe.  This also has the benefit that the
file doesn't need to be opened at all if we don't need the patch later.

I've also implemented a separate patch, which *also* should solve the
problem in a somewhat safer, but also less comprehensive manner.  The
second patch makes save_patches in Pull grab the patch ID from the patches
before writing them, rather than reading the patch file afterwards to find
the ID (which was silly).  This means we never use the patch files we read,
so as long as the GC runs fast enough, this patch without the former should
be sufficient to avoid running out of file descriptors when pulling many
patches.


Alternative solutions involve *not* reading the files lazily.  We could
also alleviate the problem by increasing the block size when lazily reading
files.  It looks like we never realloc our buffer, so increasing the block
size perhaps should be combined with a change to reduce the size of the
string to what is actually needed.

Another improvement that would go a long way towards reducing
file-descriptor leakage would be in gzReadFileLazily to check if the amount
read was less than a block size, in which case we know that the file is
finished and can close it immediately rather than closing it only when the
user asks for the second PackedString.  Actually, now that I look at the
code, this would be a *very* good thing to do.  It would mean that we'd
only leave files open if they are bigger than a blocksize, which would be
nice, especially if combined with increasing the block size to 4k or 8k.

Sat Aug 27 15:22:15 EDT 2005  David Roundy <[EMAIL PROTECTED]>
  * add test that triggers "too many open files" bug.
  We just need to pull over 1024 patches at once to trigger this bug on my
  linux system.

Sat Aug 27 15:23:49 EDT 2005  David Roundy <[EMAIL PROTECTED]>
  * avoid reading patches we just wrote when pulling.

Sat Aug 27 15:24:15 EDT 2005  David Roundy <[EMAIL PROTECTED]>
  * use unsafeInterleaveIO in Repository.writePatch to delay opening files.
  This patch solves the "too many open files" bug when pulling many patches.

New patches:

[add test that triggers "too many open files" bug.
David Roundy <[EMAIL PROTECTED]>**20050827192215
 We just need to pull over 1024 patches at once to trigger this bug on my
 linux system.
] 
<
> {
addfile ./tests/pull_many_files.pl
hunk ./tests/pull_many_files.pl 1
+#!/usr/bin/env perl
+
+# Some tests for 'darcs unpull'
+
+use lib qw(lib/perl);
+
+use Test::More qw/no_plan/;
+use Test::Darcs;
+use Shell::Command;
+use strict;
+
+rm_rf  'temp1';
+mkpath 'temp1';
+chdir  'temp1';
+darcs 'init';
+writefile("ALL --ignore-times", "_darcs/prefs/defaults");
+
+touch 'foo';
+darcs 'add foo';
+darcs "record -A x -a -m 'adding foo' foo";
+
+{ # Record lots of patches...
+  my $how_many = 1100;
+  my $n;
+  for ($n=0;$n<$how_many;$n++) {
+    writefile("$n\n", "foo");
+    darcs "record -A x -a -m 'change $n' foo";
+  }
+}
+
+chdir '..';
+rm_rf 'temp2';
+mkpath 'temp2';
+chdir 'temp2';
+darcs 'initialize';
+like( darcs('pull -a ../temp1'),
+      qr/Finished pulling/,
+      "pull works on many patches at a time" );
+
+####
+
+chdir '../';
+rm_rf 'temp1';
+rm_rf 'temp2';
+
+
+sub writefile {
+  my ($contents, $filename) = @_;
+  my $f;
+  open($f, ">$filename") || die "Couldn't open $filename";
+  print $f "$contents\n";
+  close($f);
+}
}
[avoid reading patches we just wrote when pulling.
David Roundy <[EMAIL PROTECTED]>**20050827192349] 
<
> {
hunk ./Pull.lhs 253
 save_patches _ _ (Just []) = return []
 save_patches _ _ Nothing = return []
 save_patches repo opts (Just (p:ps)) =
-    do p' <- (liftM ppt2pipt) (writePatch repo opts p)
+    do let pinf = fromJust $ patch2patchinfo p
+       (_,ptok) <- writePatch repo opts p
        ps' <- save_patches repo opts $ Just ps
hunk ./Pull.lhs 256
-       return (p':ps')
-    where ppt2pipt :: (Patch, PatchToken) -> (PatchInfo, PatchToken)
-          ppt2pipt (patch, pt) = (fromJust (patch2patchinfo patch), pt)
+       return $ (pinf, ptok) : ps'
 \end{code}
 
 \begin{code}
}
[use unsafeInterleaveIO in Repository.writePatch to delay opening files.
David Roundy <[EMAIL PROTECTED]>**20050827192415
 This patch solves the "too many open files" bug when pulling many patches.
] 
<
> {
hunk ./Repository.lhs 30
                     applyToPristine,
                     PatchSet
                   ) where
+import System.IO.Unsafe ( unsafeInterleaveIO )
 import External ( fetchFilePS, Cachable( Cachable ) )
 import Pristine ( Pristine, identifyPristine, nopristine, applyPristine,
                 )
hunk ./Repository.lhs 263
 writePatch (Repo dir _ (DarcsRepository _)) opts patch =
     withCurrentDirectory dir $
         do fp <- DarcsRepo.write_patch opts patch
-           patch' <- gzReadPatchFileLazily fp
+           patch' <- unsafeInterleaveIO $
+                withCurrentDirectory dir $ gzReadPatchFileLazily fp
            return (patch', DarcsPatchToken fp)
 writePatch (Repo dir _ GitRepository) _ patch =
     withCurrentDirectory dir $
}

Context:

[remove hideous malloc hack.
David Roundy <[EMAIL PROTECTED]>**20050818161411] 
[change my AUTHORS email to [EMAIL PROTECTED]
David Roundy <[EMAIL PROTECTED]>**20050808124703] 
[fix mkstemp implementation for win32
Peter Strand <[EMAIL PROTECTED]>**20050810211303] 
[Implement parts of System.Posix.(IO|Files) for win32
[EMAIL PROTECTED] 
[implement RawMode with library functions instead of ffi
[EMAIL PROTECTED] 
[call hsc2hs without output filename argument
[EMAIL PROTECTED] 
[Rename compat.c to c_compat.c to avoid object filename conflict with Compat.hs
[EMAIL PROTECTED] 
[Move atomic_create/sloppy_atomic_create to Compat
Ian Lynagh <[EMAIL PROTECTED]>**20050730141703] 
[Split the raw mode stuff out into its own .hsc file. Windows needs some TLC
Ian Lynagh <[EMAIL PROTECTED]>**20050730134030] 
[Move maybe_relink out of compat.c
Ian Lynagh <[EMAIL PROTECTED]>**20050730131205] 
[Remove is_symlink
Ian Lynagh <[EMAIL PROTECTED]>**20050730122255] 
[Move mkstemp to Compat.hs
Ian Lynagh <[EMAIL PROTECTED]>**20050730020918] 
[Remove unused function
Ian Lynagh <[EMAIL PROTECTED]>**20050730010118] 
[Start Compat.hs, and move stdout_is_a_pipe from compat.c
Ian Lynagh <[EMAIL PROTECTED]>**20050730004829] 
[fix compilation errors with ghc-6.2.2 on win32
Peter Strand <[EMAIL PROTECTED]>**20050809192759] 
[Retain both Git's author and committer.
Juliusz Chroboczek <[EMAIL PROTECTED]>**20050810000820] 
[Move slurping into syncPristine.
Juliusz Chroboczek <[EMAIL PROTECTED]>**20050809232101
 Avoids creating a useless pristine tree when there is none.  Thanks to
 Ian for pointing this out.
] 
[Split --relink into --relink and --relink-pristine.
Juliusz Chroboczek <[EMAIL PROTECTED]>**20050809230951
 Relinking the pristine tree breaks handling of timestamps, which causes
 Darcs to compare file contents.  It should not be used unless you know
 what you are doing.
] 
[fix for bug Ian found in apply.
David Roundy <[EMAIL PROTECTED]>**20050811162558
 This is the bug manifested in the cabal repository.
] 
[Cleanup --verbose handling in repair command
Matt Lavin <[EMAIL PROTECTED]>**20050805020630] 
[clean up Printer.wrap_text.
David Roundy <[EMAIL PROTECTED]>**20050808114844] 
[add several changelog entries.
David Roundy <[EMAIL PROTECTED]>**20050808114800] 
[improve EOD message a tad.
David Roundy <[EMAIL PROTECTED]>**20050807112644
 This change also introduces a "wrapped_text" function in Printer, so we
 won't have to worry so often about manually wrapping lines.
] 
[changed ***DARCS*** to ***END OF DESCRIPTION***
Jason Dagit <[EMAIL PROTECTED]>**20050729032543] 
[remove unused opts argument from apply_patches and apply_patches_with_feedback
Matt Lavin <[EMAIL PROTECTED]>**20050807031038] 
[add code to read patch bundles with added CRs.
David Roundy <[EMAIL PROTECTED]>**20050806222631
 I think this'll address bug #291.
] 
[accept command-line flags in any order.
David Roundy <[EMAIL PROTECTED]>**20050806211828
 In particular, we no longer require that --flags precede filename and
 repository arguments.
] 
[add obliterate command as alias for unpull.
David Roundy <[EMAIL PROTECTED]>**20050804104929] 
[Do not ask confirmation for revert -a
[EMAIL PROTECTED]
 Giving -a as a parameter means the user expects all changes to be reverted.
 Just like for unrevert and record go ahead with it do not ask for confirmation.
] 
[clarify help text for 'd' in SelectPatches.
David Roundy <[EMAIL PROTECTED]>**20050806231117] 
[Add --with-static-libs configure flag for linking static versions of libraries.
[EMAIL PROTECTED] 
[add changelog entry for bug #477.
David Roundy <[EMAIL PROTECTED]>**20050806212148] 
[add description of how to add changelog entries to ChangeLog.README.
David Roundy <[EMAIL PROTECTED]>**20050806225901] 
[Explain the missing ChangeLog
Mark Stosberg <[EMAIL PROTECTED]>**20050526135421
 
 It should be easy for casual users and contributors to view and update the
 ChangeLog.
 
 Providing a README file in the place where people are most likely to look
 provides a very useful clue.
 
 However, it's still not clear to me exactly how the system works, so I have
 left a stub to complete that documentation.
 
     Mark
 
] 
[make repair work on partial repositories.
David Roundy <[EMAIL PROTECTED]>**20050805113001] 
[Use apply_patch_with_feedback from check and repair commands
Matt Lavin <[EMAIL PROTECTED]>**20050805020830] 
[show patch numbers instead of dots on get
Matt Lavin <[EMAIL PROTECTED]>**20050804013649] 
[fix obsolete error explanation in get_extra bug.
David Roundy <[EMAIL PROTECTED]>**20050804130610] 
[simplify fix for bug 463; reuse /// from FilePathUtils
Matt Lavin <[EMAIL PROTECTED]>**20050804021130] 
[Make curl exit with error on failed downloads
[EMAIL PROTECTED] 
[Bump up AC_PREREQ version to 2.59.
[EMAIL PROTECTED] 
[fix for bug 463 (with new test)
Matt Lavin <[EMAIL PROTECTED]>**20050802002116] 
[bump version number, since I just made a release.
David Roundy <[EMAIL PROTECTED]>**20050731190756] 
[Use simpler curl_version() function to get version string.
Kannan Goundan <[EMAIL PROTECTED]>**20050322221027] 
[fix documentation on --reorder-patches.
David Roundy <[EMAIL PROTECTED]>**20050731185406] 
[add changelog entry for bug #224.
David Roundy <[EMAIL PROTECTED]>**20050731133942] 
[fix bug when editing long comment leaves empty file.
David Roundy <[EMAIL PROTECTED]>**20050731133612] 
[changelog entry for bug #189.
David Roundy <[EMAIL PROTECTED]>**20050731132624] 
[TAG 1.0.4pre2
David Roundy <[EMAIL PROTECTED]>**20050731121029] 
Patch bundle hash:
ee8158efc26bc3f711808aa366085f3b9ca258c2
_______________________________________________
darcs-devel mailing list
[email protected]
http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel

Reply via email to