Re: [perl #35847] File::Find not performing as documented
On Mon, Jul 11, 2005 at 12:35:36AM -0700, Michael G Schwern wrote: I still can't reproduce this on OS X. I guess the filesystem you have doesn't have the nlinks 2 property On Sun, Jul 10, 2005 at 07:31:02PM +0100, Dave Mitchell wrote: So we either fix the docs, as suggested, or fix the code to stat *every* entry before calling the wanted function. The latter would defeat the purpose of the optimisation, so I vote for the former. I'd be interested to see, given how much work File::Find does anyway, just how much a performance hit fixing this would be. It makes a huge difference on my laptop (slow disk): use File::Find; $File::Find::dont_use_nlink = $ARGV[0]; my $count=0; find(sub { $count++ }, '/usr'); print count=$count\n; running this the first time with arg0 = 0 took about 3 minutes; rerunning it takes about 10 secs because all the directory reads are cached by the OS. Susequently running with arg0 = 1 (bearing in mind that the directories are still cached) takes about 6 minutes on both the first and subsequent runs, presumably because my laptop hasn't got enough free ram to cache all 300K inodes that have to be read under /usr. CPU usage is almost zero; the code is completely IO bound. Perhaps the best approrach would be to document that an lstat is no longer guaranteed, but add a new option to the find() options hash, 'lstat', that if true, reinstates the guarantee. -- Now is the discount of our winter tent -- sign seen outside camping shop
Re: [perl #35847] File::Find not performing as documented
On Sun, Jul 17, 2005 at 06:42:40PM +0100, Dave Mitchell wrote: Perhaps the best approrach would be to document that an lstat is no longer guaranteed, but add a new option to the find() options hash, 'lstat', that if true, reinstates the guarantee. Sounds good to me. Seems like a waste to guarantee it if you're not going to use it anyway. -- Michael G Schwern [EMAIL PROTECTED] http://www.pobox.com/~schwern Reality is that which, when you stop believing in it, doesn't go away. -- Phillip K. Dick
Re: [perl #35847] File::Find not performing as documented
I still can't reproduce this on OS X. On Sun, Jul 10, 2005 at 07:31:02PM +0100, Dave Mitchell wrote: So we either fix the docs, as suggested, or fix the code to stat *every* entry before calling the wanted function. The latter would defeat the purpose of the optimisation, so I vote for the former. I'd be interested to see, given how much work File::Find does anyway, just how much a performance hit fixing this would be. -- Michael G Schwern [EMAIL PROTECTED] http://www.pobox.com/~schwern Ahh email, my old friend. Do you know that revenge is a dish that is best served cold? And it is very cold on the Internet!
Re: [perl #35847] File::Find not performing as documented
On Fri, Jul 08, 2005 at 10:10:56AM -, Michael G Schwern via RT wrote: I am unable to replicate this problem with either 5.8.1, 5.8.6 or [EMAIL PROTECTED] I don't have a 5.8.3 handy to try. I can: #!/usr/bin/perl mkdir 'x', 0777; open F, 'x/f1'; open F, 'x/f2'; open F, 'x/f3'; use File::Find; $File::Find::dont_use_nlink = $ARGV[0]; find sub { printf %s %s\n, -f _ ? 'file' : 'notf', $File::Find::name }, 'x'; system 'rm -r x'; which gives: $ ./perl -Ilib /tmp/p1 0 notf x notf x/f2 notf x/f3 notf x/f1 $ ./perl -Ilib /tmp/p1 1 notf x file x/f2 file x/f3 file x/f1 Its a problem as far back as 5.003_22 at least. Basically, when it does the 'nlink check' shortcut on a directory to determine whether the dir only contains files and no subdirs, it doesn't bother lstating the individual entries in the dir to determine whether they're a file or a subdir. So we either fix the docs, as suggested, or fix the code to stat *every* entry before calling the wanted function. The latter would defeat the purpose of the optimisation, so I vote for the former. -- But Sidley Park is already a picture, and a most amiable picture too. The slopes are green and gentle. The trees are companionably grouped at intervals that show them to advantage. The rill is a serpentine ribbon unwound from the lake peaceably contained by meadows on which the right amount of sheep are tastefully arranged. -- Lady Croom - Arcadia
[perl #35847] File::Find not performing as documented
[EMAIL PROTECTED] - Tue May 17 03:40:07 2005]: Not in all cases. lstat() does not always occur in directories that don't have any subdirectories. linux% cd ~/bin # Do not run test with . = dir-with-too-many-files linux% cat ../temp.pl use File::Find; $File::Find::dont_use_nlink = $ARGV[0]; my @files; find sub { push @files, $File::Find::name if -f _ /\.pm$/ }, @INC; print join \n, @files,''; linux% perl ../temp.pl 0 | wc -l 434 linux% perl ../temp.pl 1 | wc -l 943 linux% perl -v This is perl, v5.8.3 built for i586-linux I am unable to replicate this problem with either 5.8.1, 5.8.6 or [EMAIL PROTECTED] I don't have a 5.8.3 handy to try.
[perl #35847] File::Find not performing as documented
# New Ticket Created by [EMAIL PROTECTED] # Please include the string: [perl #35847] # in the subject line of all future correspondence about this issue. # URL: https://rt.perl.org/rt3/Ticket/Display.html?id=35847 This is a bug report for perl from [EMAIL PROTECTED], generated with the help of perlbug 1.34 running under perl v5.8.3. - The use of lstat() is not guaranteed in File::Find, contrary to its documentation. FAQ 3.4: How do I find which modules are installed on my system? It shouldn't matter. From perldoc File::Find (v. 1.07): * It is guaranteed that an lstat has been called before the user's wanted() function is called. This enables fast file checks involving _. Not in all cases. lstat() does not always occur in directories that don't have any subdirectories. linux% cd ~/bin # Do not run test with . = dir-with-too-many-files linux% cat ../temp.pl use File::Find; $File::Find::dont_use_nlink = $ARGV[0]; my @files; find sub { push @files, $File::Find::name if -f _ /\.pm$/ }, @INC; print join \n, @files,''; linux% perl ../temp.pl 0 | wc -l 434 linux% perl ../temp.pl 1 | wc -l 943 linux% perl -v This is perl, v5.8.3 built for i586-linux linux% diff -u Find.pm.orig Find.pm --- Find.pm.orig2004-02-27 08:31:34.0 -0800 +++ Find.pm 2005-05-17 02:39:04.0 -0700 @@ -120,8 +120,11 @@ =item * -It is guaranteed that an Ilstat has been called before the user's -Cwanted() function is called. This enables fast file checks involving S _. +Previous versions of File::Find were guaranteed to call an Ilstat +before the user's Cwanted() function was called, but this is no +longer the case. Since this depends on File::Find::done_use_nlink, $^O, +and other factors, fast file checks involving S _ are not recommended +unless Cwanted() calls Ilstat first. =item * [Please do not change anything below this line] - --- Flags: category=library severity=low --- Site configuration information for perl v5.8.3: Configured by jms at Tue Feb 17 02:18:23 PST 2004. Summary of my perl5 (revision 5.0 version 8 subversion 3) configuration: Platform: osname=linux, osvers=2.4.20-28.9, archname=i586-linux uname='linux mathras 2.4.20-28.9 #1 thu dec 18 13:46:46 est 2003 i586 i586 i386 gnulinux ' config_args='-der' hint=previous, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -I/usr/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm', optimize='-O3', cppflags='-fno-strict-aliasing -I/usr/include -I/usr/include/gdbm -fno-strict-aliasing -I/usr/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm' ccversion='', gccversion='3.2.2 20030222 (Red Hat Linux 3.2.2-5)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/lib' libpth=/usr/lib /lib libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.3.2' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic' cccdlflags='-fpic', lddlflags='-shared -L/usr/lib' Locally applied patches: --- @INC for perl v5.8.3: /usr/lib/perl5/5.8.3/i586-linux /usr/lib/perl5/5.8.3 /usr/lib/perl5/site_perl/5.8.3/i586-linux /usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl . --- Environment for perl v5.8.3: HOME=/home/jms LANG=en_US LANGUAGE (unset) LANGVAR=en_US LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/usr/bin:/usr/sbin:/bin:/sbin:/usr/local/bin:/usr/local/sbin:/usr/X11R6/bin:/home/jms/bin PERL_BADLANG (unset) SHELL=/bin/tcsh