Re: [perl #35847] File::Find not performing as documented

2005-07-17 Thread Dave Mitchell
On Mon, Jul 11, 2005 at 12:35:36AM -0700, Michael G Schwern wrote:
 I still can't reproduce this on OS X.

I guess the filesystem you have doesn't have the nlinks  2 property

 On Sun, Jul 10, 2005 at 07:31:02PM +0100, Dave Mitchell wrote:
  So we either fix the docs, as suggested, or fix the code to stat *every*
  entry before calling the wanted function. The latter would defeat the
  purpose of the optimisation, so I vote for the former.
 
 I'd be interested to see, given how much work File::Find does anyway,
 just how much a performance hit fixing this would be.

It makes a huge difference on my laptop (slow disk):

use File::Find;
$File::Find::dont_use_nlink = $ARGV[0];

my $count=0;
find(sub { $count++ }, '/usr');
print count=$count\n;

running this the first time with arg0 = 0 took about 3 minutes; rerunning
it takes about 10 secs because all the directory reads are cached by the
OS.

Susequently running with arg0 = 1 (bearing in mind that the directories
are still cached) takes about 6 minutes on both the first and subsequent
runs, presumably because my laptop hasn't got enough free ram to cache
all 300K inodes that have to be read under /usr.

CPU usage is almost zero; the code is completely IO bound.

Perhaps the best approrach would be to document that an lstat is no longer
guaranteed, but add a new option to the find() options hash, 'lstat',
that if true, reinstates the guarantee.


-- 
Now is the discount of our winter tent
-- sign seen outside camping shop


Re: [perl #35847] File::Find not performing as documented

2005-07-17 Thread Michael G Schwern
On Sun, Jul 17, 2005 at 06:42:40PM +0100, Dave Mitchell wrote:
 Perhaps the best approrach would be to document that an lstat is no longer
 guaranteed, but add a new option to the find() options hash, 'lstat',
 that if true, reinstates the guarantee.

Sounds good to me.  Seems like a waste to guarantee it if you're not going
to use it anyway.


-- 
Michael G Schwern [EMAIL PROTECTED] http://www.pobox.com/~schwern
Reality is that which, when you stop believing in it, doesn't go away.
-- Phillip K. Dick


Re: [perl #35847] File::Find not performing as documented

2005-07-11 Thread Michael G Schwern
I still can't reproduce this on OS X.


On Sun, Jul 10, 2005 at 07:31:02PM +0100, Dave Mitchell wrote:
 So we either fix the docs, as suggested, or fix the code to stat *every*
 entry before calling the wanted function. The latter would defeat the
 purpose of the optimisation, so I vote for the former.

I'd be interested to see, given how much work File::Find does anyway,
just how much a performance hit fixing this would be.


-- 
Michael G Schwern [EMAIL PROTECTED] http://www.pobox.com/~schwern
Ahh email, my old friend.  Do you know that revenge is a dish that is best 
served cold?  And it is very cold on the Internet!


Re: [perl #35847] File::Find not performing as documented

2005-07-10 Thread Dave Mitchell
On Fri, Jul 08, 2005 at 10:10:56AM -, Michael G Schwern via RT wrote:
 I am unable to replicate this problem with either 5.8.1, 5.8.6 or
 [EMAIL PROTECTED]  I don't have a 5.8.3 handy to try.

I can:

#!/usr/bin/perl
mkdir 'x', 0777;
open F, 'x/f1';
open F, 'x/f2';
open F, 'x/f3';
use File::Find;
$File::Find::dont_use_nlink = $ARGV[0];
find sub { printf %s %s\n, -f _ ? 'file' : 'notf', $File::Find::name }, 
'x';
system 'rm -r x';

which gives:

$ ./perl -Ilib /tmp/p1 0
notf x
notf x/f2
notf x/f3
notf x/f1
$ ./perl -Ilib /tmp/p1 1
notf x
file x/f2
file x/f3
file x/f1

Its a problem as far back as 5.003_22 at least.

Basically, when it does the 'nlink check' shortcut on a directory to
determine whether the dir only contains files and no subdirs, it doesn't
bother lstating the individual entries in the dir to determine whether
they're a file or a subdir.

So we either fix the docs, as suggested, or fix the code to stat *every*
entry before calling the wanted function. The latter would defeat the
purpose of the optimisation, so I vote for the former.

-- 
But Sidley Park is already a picture, and a most amiable picture too.
The slopes are green and gentle. The trees are companionably grouped at
intervals that show them to advantage. The rill is a serpentine ribbon
unwound from the lake peaceably contained by meadows on which the right
amount of sheep are tastefully arranged. -- Lady Croom - Arcadia


[perl #35847] File::Find not performing as documented

2005-07-08 Thread Michael G Schwern via RT
[EMAIL PROTECTED] - Tue May 17 03:40:07 2005]:
 Not in all cases.  lstat() does not always occur in directories
 that don't have any subdirectories.
 
 linux% cd ~/bin  # Do not run test with . = dir-with-too-many-files
 linux% cat ../temp.pl
 use File::Find;
 $File::Find::dont_use_nlink = $ARGV[0];
 my @files;
 find sub { push @files, $File::Find::name if -f _  /\.pm$/ }, @INC;
 print join \n, @files,'';
 linux% perl ../temp.pl 0 | wc -l
 434
 linux% perl ../temp.pl 1 | wc -l
 943
 linux% perl -v
 This is perl, v5.8.3 built for i586-linux

I am unable to replicate this problem with either 5.8.1, 5.8.6 or
[EMAIL PROTECTED]  I don't have a 5.8.3 handy to try.



[perl #35847] File::Find not performing as documented

2005-05-17 Thread via RT
# New Ticket Created by  [EMAIL PROTECTED] 
# Please include the string:  [perl #35847]
# in the subject line of all future correspondence about this issue. 
# URL: https://rt.perl.org/rt3/Ticket/Display.html?id=35847 



This is a bug report for perl from [EMAIL PROTECTED],
generated with the help of perlbug 1.34 running under perl v5.8.3.


-

The use of lstat() is not guaranteed in File::Find, contrary
to its documentation.

 FAQ 3.4: How do I find which modules are installed on my system?

 It shouldn't matter.  From perldoc File::Find (v. 1.07):

   * It is guaranteed that an lstat has been called before the
 user's wanted() function is called. This enables fast file
 checks involving  _.


Not in all cases.  lstat() does not always occur in directories
that don't have any subdirectories.

linux% cd ~/bin  # Do not run test with . = dir-with-too-many-files
linux% cat ../temp.pl
use File::Find;
$File::Find::dont_use_nlink = $ARGV[0];
my @files;
find sub { push @files, $File::Find::name if -f _  /\.pm$/ }, @INC;
print join \n, @files,'';
linux% perl ../temp.pl 0 | wc -l
434
linux% perl ../temp.pl 1 | wc -l
943
linux% perl -v
This is perl, v5.8.3 built for i586-linux 



linux% diff -u Find.pm.orig Find.pm
--- Find.pm.orig2004-02-27 08:31:34.0 -0800
+++ Find.pm 2005-05-17 02:39:04.0 -0700
@@ -120,8 +120,11 @@
 
 =item *
 
-It is guaranteed that an Ilstat has been called before the user's
-Cwanted() function is called. This enables fast file checks involving S _.
+Previous versions of File::Find were guaranteed to call an Ilstat
+before the user's Cwanted() function was called, but this is no
+longer the case.  Since this depends on File::Find::done_use_nlink, $^O,
+and other factors, fast file checks involving S _ are not recommended
+unless Cwanted() calls Ilstat first.
 
 =item *
 

[Please do not change anything below this line]
-
---
Flags:
category=library
severity=low
---
Site configuration information for perl v5.8.3:

Configured by jms at Tue Feb 17 02:18:23 PST 2004.

Summary of my perl5 (revision 5.0 version 8 subversion 3) configuration:
  Platform:
osname=linux, osvers=2.4.20-28.9, archname=i586-linux
uname='linux mathras 2.4.20-28.9 #1 thu dec 18 13:46:46 est 2003 i586 i586 
i386 gnulinux '
config_args='-der'
hint=previous, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef 
usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
  Compiler:
cc='cc', ccflags ='-fno-strict-aliasing -I/usr/include -D_LARGEFILE_SOURCE 
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
optimize='-O3',
cppflags='-fno-strict-aliasing -I/usr/include -I/usr/include/gdbm 
-fno-strict-aliasing -I/usr/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 
-I/usr/include/gdbm'
ccversion='', gccversion='3.2.2 20030222 (Red Hat Linux 3.2.2-5)', 
gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', 
lseeksize=8
alignbytes=4, prototype=define
  Linker and Libraries:
ld='cc', ldflags =' -L/usr/lib'
libpth=/usr/lib /lib
libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.3.2'
  Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
cccdlflags='-fpic', lddlflags='-shared -L/usr/lib'

Locally applied patches:


---
@INC for perl v5.8.3:
/usr/lib/perl5/5.8.3/i586-linux
/usr/lib/perl5/5.8.3
/usr/lib/perl5/site_perl/5.8.3/i586-linux
/usr/lib/perl5/site_perl/5.8.3
/usr/lib/perl5/site_perl
.

---
Environment for perl v5.8.3:
HOME=/home/jms
LANG=en_US
LANGUAGE (unset)
LANGVAR=en_US
LD_LIBRARY_PATH (unset)
LOGDIR (unset)

PATH=/usr/bin:/usr/sbin:/bin:/sbin:/usr/local/bin:/usr/local/sbin:/usr/X11R6/bin:/home/jms/bin
PERL_BADLANG (unset)
SHELL=/bin/tcsh