Tim Vanderhoek wrote:
> 
> On Thu, Jul 29, 1999 at 01:59:45AM +0900, Daniel C. Sobral wrote:
> >
> > Sorry, but a simplistic analysis like that just won't cut for grep.
> > The algorithmic complexity is highly relevant here. Try this:
> 
> Algorithmic complexity!?!

Yup.

> It's a freaking grep application.  There is no freaking algorithmic
> complexity.

Pattern matching is one of the prime examples of algorithmic
complexity.

You can add complexity very trivially.

> At least not outside of our regex library, anyways.  

I had not looked at the source, so I didn't know exactly how the
application did it's stuff.

Now I did, and I'll comment. Let's say the number of patterns is N,
and the total number of characters to be examined is S. Let's call
the "unmodified" complexity C, just for the sake of simplifying
comparision using a dangerous simplification.

First, the new grep uses fgetln(). fgetln() searches for a new line.
So each character is examined (at least) twice. That's C+S*read
already. GNU Grep uses mmap() (or read(), but not in FreeBSD), so it
doesn't incur in this additional complexity.

Also, fgetln() will copy the line buffer from time to time, though
that's not a simple computation, and probably of little
significance.

In addition to that, the new grep copies the fgrepln() result each
time. Add S*copy to C.

Next, the new grep tests the lines against each pattern separately!
GNU grep doesn't.

That's just *outside* the regexp library. Now, whether the
complexity is inside or outside the regexp library, I don't care.
It's complexity all the same. So it *must* be factored in.

> The test you
> suggested doesn't show anything about that algorithmic complexity,
> though.

Yeah? Try to back that with the results of the tests I suggested.

> If we have a slow regex library, though, I would consider that a
> separate problem from a slow grep.

If the f*cking grep is f*cking slow, I don't give a f*ck where the
problem is located! It just *IS*. GNU grep uses gnu regexp library,
the new grep uses our own. If changing greps means changing to a
library whose algorithm complexity is greater, then that *DOES*
count against the change.

For instance, a quick browse over GNU greps shows the gnu regexp
library can factor in multiple patterns. That is not being done by
the new grep. Does our regexp library support that?

Now, here is a quick and dirty fix for the repeated malloc()/free().
Notice that this is what fgetln() does, in fact. I'm afraid, though,
that's this is not anywhere near what would be needed by far to put
the new grep anywhere near the league of GNU grep.

I like the idea of a readable code, I like the idea of a BSD
license, but it would be damn silly to replace a clearly superior
grep, and that's where the thing stands right now.

--- util.c.orig Thu Jul 29 19:14:17 1999
+++ util.c      Thu Jul 29 20:49:16 1999
@@ -107,6 +107,8 @@
 
        ln.file = fn;
        ln.line_no = 0;
+       ln.bufsize = 81; /* Magical constants, yeah! */
+       ln.dat = grep_malloc(81);
        linesqueued = 0;
 
        if (Bflag > 0)
@@ -115,11 +117,14 @@
                ln.off = grep_tell();
                if ((tmp = grep_getln(&ln.len)) == NULL)
                        break;
-               ln.dat = grep_malloc(ln.len + 1);
+               if (ln.bufsize < ln.len + 1)
+                       ln.dat = grep_realloc(ln.dat, ln.len + 1);
                memcpy(ln.dat, tmp, ln.len);
-               ln.dat[ln.len] = 0;
                if (ln.len > 0 && ln.dat[ln.len - 1] == '\n')
                        ln.dat[--ln.len] = 0;
+               else
+                       ln.dat[ln.len] = 0;
+
                ln.line_no++;
 
                z = tail;
@@ -127,9 +132,9 @@
                        enqueue(&ln);
                        linesqueued++;
                }
-               free(ln.dat);
                c += t;
        }
+       free(ln.dat);
        if (Bflag > 0)
                clearqueue();
        grep_close();
--- grep.h.orig Thu Jul 29 20:47:52 1999
+++ grep.h      Thu Jul 29 20:48:34 1999
@@ -35,6 +35,7 @@
 
 typedef struct {
        size_t           len;
+       size_t           bufsize;
        int              line_no;
        int              off;
        char            *file;


--
Daniel C. Sobral                        (8-DCS)
d...@newsguy.com
d...@freebsd.org

        "Is it true that you're a millionaire's son who never worked a day
in your life?"
        "Yeah, I guess so."
        "Lemme tell you, son, you ain't missed a thing."



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Reply via email to