Re: making less faster on large amounts of stdin data

Wojtek Pilorz Thu, 20 Dec 2001 02:45:07 -0800

On Fri, 14 Dec 2001, Thomas Dodd wrote:

> Date: Fri, 14 Dec 2001 16:48:58 -0600
> From: Thomas Dodd <[EMAIL PROTECTED]>
> Reply-To: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: Re: making less faster on large amounts of stdin data
Sorry for late response; I did not check redhat-devel list for a few days
(and was not on cc: list)

> 
> 
> 
> John Summerfield wrote:
> 
> > This is a silly, negative response. If the patch does what Wojtek says, 
> > the IMV it should be applied to the source.
> 
> 
> A patch to speed up a strange use of a program is what
> seams silly. less (and more) are interactive. why use
> them in a non interactive way? What's the purpose?
The patch is to make less faster when processing data coming from stdin;
The common use is for viewing output from programs which generate large
amounts of data and for viewing compressed data without creating temporary
files.

The usage of wc was just to demonstrate speedup more easily, and only
that.
Again, the reason for the patch was to make interactive usage of 'less'
faster.
The usage of wc was to force non-interactive usage of less to be able
to get consistent results from time command.
I have verified that non-interactive usage speedup is at least the same;

Please let me include some more specific information;
My machine is Compaq DeskPro, Pentium II (Deschutes)/450 MHz, 100 MHz FSB, 
Intel 440BX/ZX Chipset, 128 MB RAM;
Red Hat Linux 6.2
less is standard binary from RedHat Linux,
lless is a patched one;
I have created a file compressed with lzop (which is very fast at
decompression);

$ perl -e '$n=800000; $o=0;
 for ($i=1;$i<= $n; ++$i) {
 $l=sprintf "%s line %8d, offset %10d\n", "="x16, $i, $o;
 print $l; $o += length $l; }' | lzop -9 > test40M.lzo

now test40M.lzo is 5813461 bytes long, decompressing it in memory takes below 
1 second:

[wp@wpst lfiles]$ time lzop -vt test40M.lzo 
testing test40M.lzo OK

real    0m0.850s
user    0m0.570s
sys     0m0.100s
[wp@wpst lfiles]$ time lzop -vt test40M.lzo 
testing test40M.lzo OK

real    0m0.651s
user    0m0.620s
sys     0m0.030s

decompressing it to a pipe and processing with wc takes about 2 seconds:

[wp@wpst lfiles]$ time lzop -dc test40M.lzo | wc
 800000 4000000 40000000

real    0m2.264s
user    0m2.050s
sys     0m0.200s

adding lless before wc takes time to about 7 seconds

[wp@wpst lfiles]$ time lzop -dc test40M.lzo | lless | wc
 800000 4000000 40000000

real    0m7.269s
user    0m5.690s
sys     0m0.970s
[wp@wpst lfiles]$ time lzop -dc test40M.lzo | lless | wc
 800000 4000000 40000000

real    0m6.850s
user    0m5.440s
sys     0m1.170s

When using lless interactively (that is I run
lzop -dc test40M.lzo | lless,
press G to go to end of data and measure time from pressing G to moment
less becomes ready to accept keystrokes (other than Ctrl-C)
) the time measured was 6.2 .. 6.4 seconds, which is very close
to timing of 'artificial' test with wc

now testing the above with standard less binary:

[wp@wpst lfiles]$ time lzop -dc test40M.lzo | less | wc
 800000 4000000 40000000

real    2m44.373s
user    2m41.600s
sys     0m1.840s

which is more than 20 times slower for that 'artificial' usage of less
than with lless

Now if I run 
lzop -dc test40M.lzo | less
press G and wait for less becoming fully functional (that is until it counts
all the lines) it takes 8 minues 30 seconds, that is about 80 times slower
than the patched version (of that 8min+30seconds, 2min+41seconds is reading
data into buffers, very close to the non-interactive tests, and the rest
(about 5 minutes+50 seconds) is calculating the line numbers.

This timings is pushing the limits of my patience rather hard (especially when
I have to stare at the screen with a stopwatch in my hand).

> 
> 
> > It has no significant impact on its size. I can't tell whether there's 
> > an adverse impact on small machines - if so, then it needs to caclulate 
> > a buffer size and use the and that's more involved, but if Wojtek's 
> > right, worth doing.
> 
> 
> Since he said to run on a 128M+ system,
> I take it a low mem system would have trouble.
> 
I used it on 120MB system and had no swapping for the amount of data from
my example.
On smaller system one would have swapping because all the produced by
the script (40MB) had to be in memory for less process;
For smaller memory systems (say 32MB of RAM, or 48 MB RAM) putting that
much of data into virtual memory would be a bad idea, one whould use
temporary files then.

> A 100 x increase in the buffer size is a lot if
> 
> the current buffer is 1MB, but not much if it's only 1KB.
The current usage is in buffers of 1KB each.

> 
> Would making it dynamic not slow it down and negate
> the improvement?
Some slowdown would surely result from making it dynamic.
I do not think it would be noticable, but would have to test to make sure.

> 
> 
> > I suggest offering the patch to its author. See http://www.greenwoodsoft
> > ware.com/less/
> 
> 
> That's reasonable.
I will try that. However, less is going to be compatible with many
systems, including DOS (if it could be called an operating system), where
allocating 100kB of memory is problematic both in real and 16-bit
protected mode.

On the other hand, most of current Linux distributions are oriented towards
hardware that was high-end a few years ago;
I do not think installing RedHat Linux 7.2 on machine with less than 32 MB RAM
 would be convenient, or even possible (using the installation program).

And it is quite common to have desktop machines with 512MB or more of RAM;
When one wants to view large compressed LZO or GZIP file on such machine,
the speed of unpatched less, as bundled with RedHat Linux, is unsafisfactory,
for even the most patient people.

The larger the data is, the more slowdown will be observed (as I mentioned in my
 original post, the time is about proportional to square of stdin/piped data size)

> 
> 
>       -Thomas
> 
>
Again,

I hope this is of use to some people.

Best regards,

Wojtek

PS. When I mentioned 40MB of data in my example, it should be actually
40 million bytes. I hope it will be forgiven.

_______________________________________________
Redhat-devel-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-devel-list

Re: making less faster on large amounts of stdin data

Reply via email to