On Fri, 14 Dec 2001, Thomas Dodd wrote: > Date: Fri, 14 Dec 2001 16:48:58 -0600 > From: Thomas Dodd <[EMAIL PROTECTED]> > Reply-To: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: Re: making less faster on large amounts of stdin data Sorry for late response; I did not check redhat-devel list for a few days (and was not on cc: list)
> > > > John Summerfield wrote: > > > This is a silly, negative response. If the patch does what Wojtek says, > > the IMV it should be applied to the source. > > > A patch to speed up a strange use of a program is what > seams silly. less (and more) are interactive. why use > them in a non interactive way? What's the purpose? The patch is to make less faster when processing data coming from stdin; The common use is for viewing output from programs which generate large amounts of data and for viewing compressed data without creating temporary files. The usage of wc was just to demonstrate speedup more easily, and only that. Again, the reason for the patch was to make interactive usage of 'less' faster. The usage of wc was to force non-interactive usage of less to be able to get consistent results from time command. I have verified that non-interactive usage speedup is at least the same; Please let me include some more specific information; My machine is Compaq DeskPro, Pentium II (Deschutes)/450 MHz, 100 MHz FSB, Intel 440BX/ZX Chipset, 128 MB RAM; Red Hat Linux 6.2 less is standard binary from RedHat Linux, lless is a patched one; I have created a file compressed with lzop (which is very fast at decompression); $ perl -e '$n=800000; $o=0; for ($i=1;$i<= $n; ++$i) { $l=sprintf "%s line %8d, offset %10d\n", "="x16, $i, $o; print $l; $o += length $l; }' | lzop -9 > test40M.lzo now test40M.lzo is 5813461 bytes long, decompressing it in memory takes below 1 second: [wp@wpst lfiles]$ time lzop -vt test40M.lzo testing test40M.lzo OK real 0m0.850s user 0m0.570s sys 0m0.100s [wp@wpst lfiles]$ time lzop -vt test40M.lzo testing test40M.lzo OK real 0m0.651s user 0m0.620s sys 0m0.030s decompressing it to a pipe and processing with wc takes about 2 seconds: [wp@wpst lfiles]$ time lzop -dc test40M.lzo | wc 800000 4000000 40000000 real 0m2.264s user 0m2.050s sys 0m0.200s adding lless before wc takes time to about 7 seconds [wp@wpst lfiles]$ time lzop -dc test40M.lzo | lless | wc 800000 4000000 40000000 real 0m7.269s user 0m5.690s sys 0m0.970s [wp@wpst lfiles]$ time lzop -dc test40M.lzo | lless | wc 800000 4000000 40000000 real 0m6.850s user 0m5.440s sys 0m1.170s When using lless interactively (that is I run lzop -dc test40M.lzo | lless, press G to go to end of data and measure time from pressing G to moment less becomes ready to accept keystrokes (other than Ctrl-C) ) the time measured was 6.2 .. 6.4 seconds, which is very close to timing of 'artificial' test with wc now testing the above with standard less binary: [wp@wpst lfiles]$ time lzop -dc test40M.lzo | less | wc 800000 4000000 40000000 real 2m44.373s user 2m41.600s sys 0m1.840s which is more than 20 times slower for that 'artificial' usage of less than with lless Now if I run lzop -dc test40M.lzo | less press G and wait for less becoming fully functional (that is until it counts all the lines) it takes 8 minues 30 seconds, that is about 80 times slower than the patched version (of that 8min+30seconds, 2min+41seconds is reading data into buffers, very close to the non-interactive tests, and the rest (about 5 minutes+50 seconds) is calculating the line numbers. This timings is pushing the limits of my patience rather hard (especially when I have to stare at the screen with a stopwatch in my hand). > > > > It has no significant impact on its size. I can't tell whether there's > > an adverse impact on small machines - if so, then it needs to caclulate > > a buffer size and use the and that's more involved, but if Wojtek's > > right, worth doing. > > > Since he said to run on a 128M+ system, > I take it a low mem system would have trouble. > I used it on 120MB system and had no swapping for the amount of data from my example. On smaller system one would have swapping because all the produced by the script (40MB) had to be in memory for less process; For smaller memory systems (say 32MB of RAM, or 48 MB RAM) putting that much of data into virtual memory would be a bad idea, one whould use temporary files then. > A 100 x increase in the buffer size is a lot if > > the current buffer is 1MB, but not much if it's only 1KB. The current usage is in buffers of 1KB each. > > Would making it dynamic not slow it down and negate > the improvement? Some slowdown would surely result from making it dynamic. I do not think it would be noticable, but would have to test to make sure. > > > > I suggest offering the patch to its author. See http://www.greenwoodsoft > > ware.com/less/ > > > That's reasonable. I will try that. However, less is going to be compatible with many systems, including DOS (if it could be called an operating system), where allocating 100kB of memory is problematic both in real and 16-bit protected mode. On the other hand, most of current Linux distributions are oriented towards hardware that was high-end a few years ago; I do not think installing RedHat Linux 7.2 on machine with less than 32 MB RAM would be convenient, or even possible (using the installation program). And it is quite common to have desktop machines with 512MB or more of RAM; When one wants to view large compressed LZO or GZIP file on such machine, the speed of unpatched less, as bundled with RedHat Linux, is unsafisfactory, for even the most patient people. The larger the data is, the more slowdown will be observed (as I mentioned in my original post, the time is about proportional to square of stdin/piped data size) > > > -Thomas > > Again, I hope this is of use to some people. Best regards, Wojtek PS. When I mentioned 40MB of data in my example, it should be actually 40 million bytes. I hope it will be forgiven. _______________________________________________ Redhat-devel-list mailing list [EMAIL PROTECTED] https://listman.redhat.com/mailman/listinfo/redhat-devel-list