Re: replacing grep(1)

1999-07-31 Thread Daniel C. Sobral

John-Mark Gurney wrote:
 
 right now, I'm trying to think of a way to eliminate the fgetln searching
 for end of line... of course this would eliminate some of the simplicity
 of design, but we can get a BIG speed increase if we simply don't scan for
 the new line unless we NEED to...  and if we do, why not use regexec to
 search for us?

As Dillon said, the decrease in speed of the scan might not be that
great. On the other hand, a decent pattern matching algorithm *does
not* examine every character (which is why GNU grep performs so much
better with bigger patterns).

--
Daniel C. Sobral(8-DCS)
[EMAIL PROTECTED]
[EMAIL PROTECTED]

- Jordan, God, what's the difference?
- God doesn't belong to the -core.




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-31 Thread Sheldon Hearn



On Fri, 30 Jul 1999 22:07:26 -0400, Tim Vanderhoek wrote:

 b$ time ./grep -E '(vt100)|(printer)' longfile  /dev/null
 b$ time grep '(vt100)|(printer)' longfile  /dev/null

You think that's fair? Surely you can't expect Jamie's extended regex
support to outperform GNU's simple regex support? :-)

Ciao,
Sheldon.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-31 Thread Tim Vanderhoek

On Sat, Jul 31, 1999 at 11:56:16PM +0200, Sheldon Hearn wrote:
 
  b$ time ./grep -E '(vt100)|(printer)' longfile  /dev/null
  b$ time grep '(vt100)|(printer)' longfile  /dev/null
 
 You think that's fair? Surely you can't expect Jamie's extended regex
 support to outperform GNU's simple regex support? :-)

GNU has no simple regex support.

Actually, neither did Jamie's by the time I did that test, but I added
the -E flag to make it obvious what was going on.  :)

I rather hope that the rumoured newer version of H. Spencer's regex
lib is faster...  Being as slow for that pattern as it is has got to
be a bug of some sort...  It's actually faster to scan the file twice,
once for the first string and then for the second.


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-31 Thread James Howard

On Sat, 31 Jul 1999, Tim Vanderhoek wrote:

 I rather hope that the rumoured newer version of H. Spencer's regex
 lib is faster...  Being as slow for that pattern as it is has got to
 be a bug of some sort...  It's actually faster to scan the file twice,
 once for the first string and then for the second.

If it is not, how about linking it with libregex?  I realize it is GNU
too, but it will be there whether or not grep gets replaced and the
authors were at least kind enough to LGPL it instead.  Hey, maybe someone
who knows more about regular expressions than I do would feel compelled to
rewrite GNU regex... :)  I bet the existing Spencer libraries would be a
good starting point and maybe the rumored new version is a great starting
point...  But that's enough hint dropping...

Jamie



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-31 Thread Daniel C. Sobral
John-Mark Gurney wrote:
 
 right now, I'm trying to think of a way to eliminate the fgetln searching
 for end of line... of course this would eliminate some of the simplicity
 of design, but we can get a BIG speed increase if we simply don't scan for
 the new line unless we NEED to...  and if we do, why not use regexec to
 search for us?

As Dillon said, the decrease in speed of the scan might not be that
great. On the other hand, a decent pattern matching algorithm *does
not* examine every character (which is why GNU grep performs so much
better with bigger patterns).

--
Daniel C. Sobral(8-DCS)
d...@newsguy.com
d...@freebsd.org

- Jordan, God, what's the difference?
- God doesn't belong to the -core.




To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-31 Thread Sheldon Hearn


On Fri, 30 Jul 1999 22:07:26 -0400, Tim Vanderhoek wrote:

 b$ time ./grep -E '(vt100)|(printer)' longfile  /dev/null
 b$ time grep '(vt100)|(printer)' longfile  /dev/null

You think that's fair? Surely you can't expect Jamie's extended regex
support to outperform GNU's simple regex support? :-)

Ciao,
Sheldon.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-31 Thread Tim Vanderhoek
On Sat, Jul 31, 1999 at 11:56:16PM +0200, Sheldon Hearn wrote:
 
  b$ time ./grep -E '(vt100)|(printer)' longfile  /dev/null
  b$ time grep '(vt100)|(printer)' longfile  /dev/null
 
 You think that's fair? Surely you can't expect Jamie's extended regex
 support to outperform GNU's simple regex support? :-)

GNU has no simple regex support.

Actually, neither did Jamie's by the time I did that test, but I added
the -E flag to make it obvious what was going on.  :)

I rather hope that the rumoured newer version of H. Spencer's regex
lib is faster...  Being as slow for that pattern as it is has got to
be a bug of some sort...  It's actually faster to scan the file twice,
once for the first string and then for the second.


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-31 Thread James Howard
On Sat, 31 Jul 1999, Tim Vanderhoek wrote:

 I rather hope that the rumoured newer version of H. Spencer's regex
 lib is faster...  Being as slow for that pattern as it is has got to
 be a bug of some sort...  It's actually faster to scan the file twice,
 once for the first string and then for the second.

If it is not, how about linking it with libregex?  I realize it is GNU
too, but it will be there whether or not grep gets replaced and the
authors were at least kind enough to LGPL it instead.  Hey, maybe someone
who knows more about regular expressions than I do would feel compelled to
rewrite GNU regex... :)  I bet the existing Spencer libraries would be a
good starting point and maybe the rumored new version is a great starting
point...  But that's enough hint dropping...

Jamie



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Dag-Erling Smorgrav

James Howard [EMAIL PROTECTED] writes:
 DES tells me he has a new version (0.10) which mmap()s.  It supposedly
 cuts the run time down significantly, I do not have the numbers in front
 of me.  Unfortunetly he has not posted this version yet so I cannot
 download it and run it myself.

It's in the usual place (ftp://ftp.ofug.org/pub/grep/).

 He also says that if mmap fails, he drops
 back to stdio.  This should only happen in the NFS case, the  2G case,
 etc.

Any case in which a) the file is too large to mmap, b) the file is not
a regular file, or c) mmap() fails (e.g. NFS).

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Dag-Erling Smorgrav

John-Mark Gurney [EMAIL PROTECTED] writes:
 it was VERY simple to do... and attached is the patch... this uses the
 option REG_STARTEND to do what the copy was trying to do... all of the
 code to use REG_STARTEND was already there, it just needed to be enabled..

Funnily, I experience a near-doubling of running time with similar
patches.

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message




Re: replacing grep(1)

1999-07-30 Thread Daniel C. Sobral

John-Mark Gurney wrote:
 
 ok, I just made a patch to eliminate the copy that was happening in
 procfile, and it sped up a grep of a 5meg termcap from about 2.9sec
 down to .6 seconds... this includes time spent profiling the program..
 GNU grep w/o profiling only takes .15sec so we ARE getting closer to
 GNU grep...

Rather impressive. But... did you run these tests more than once, to
account for vm caching?

 it was VERY simple to do... and attached is the patch... this uses the
 option REG_STARTEND to do what the copy was trying to do... all of the
 code to use REG_STARTEND was already there, it just needed to be enabled..

Just for the record... :-) This eliminates one of the "added
complexities" I pointed out. 

--
Daniel C. Sobral(8-DCS)
[EMAIL PROTECTED]
[EMAIL PROTECTED]

"Is it true that you're a millionaire's son who never worked a day
in your life?"
"Yeah, I guess so."
"Lemme tell you, son, you ain't missed a thing."




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Dag-Erling Smorgrav

"Daniel C. Sobral" [EMAIL PROTECTED] writes:
 Dag-Erling Smorgrav wrote:
  To be precise, I experience a 30% decrease in system time and a 100%
  increase in user time when I use RE_STARTEND and eliminate the
  malloc() / memcpy() calls in procfile().
 Could you please test my patch that removes malloc() but bot
 memcpy()? Here it is again, though against an old version:

Yeah. You can do even better by declaring ln static and never
free()ing it.

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Dag-Erling Smorgrav

"Daniel C. Sobral" [EMAIL PROTECTED] writes:
 Could you please test my patch that removes malloc() but bot
 memcpy()? Here it is again, though against an old version:

Bingo. REG_STARTEND is significantly more expensive than memcpy().

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Tim Vanderhoek

On Fri, Jul 30, 1999 at 10:56:55PM +0900, Daniel C. Sobral wrote:
 
 I said that I did not care whether the thing is inside or outside
 the regexp library.

Yes, although I think at this point it's obvious we're coming at this
discussion from fairly different perspectives.  By the time you
brought-up complexity originally, I had more or less decided that I
did not want to see the new grep imported without significant speed
improvements and was concerned with how to improve grep.  Your
interest is in debating that point (fortunately arguing for the
side I agree with :).


 4) grep -e 123 456 world.build

[I assume "grep -e 123 -e 124 world.build"]

 One can clearly see that GNU grep has a much better complexity in
 the cases of longer patterns or multiple patterns with common
 prefix.

Alright, someone else already mentioned to me in email that I
totally ignored what differences involved multiple patterns.
Combining multiple patterns is a big win if those two patterns have
a common prefix (I hadn't considered the case of similar patterns
before, actually).  Combining multiple patterns when they're
dissimilar doesn't appear to help much (which is the only case I had
considered -- my mistake, and also the reason I ignored what you
said about multiple patterns).

I'm surprised by the way GNU grep is able to handle longer patterns,
and I probably wouldn't have noticed it unless I'd taken some time to
examine the GNU source.

Congratulations, you win.  :)  The rest of your lengthy message mostly
goes on to repeat the fact that GNU grep is able to merge multiple
patterns with a common prefix (and postfix?) to good effect.


 It also shows that the new grep spends a lot of time in an activity
 not related to the search itself, since it does multiple patterns by

Well, duh.  This is really why my reaction to "complexity analysis" is
(still) what it is.  Complexity analysis is almost only useful for
comparing two different algorithms and the fact that the new grep
spends a lot of time doing things other than pattern searching is
quite obvious after a casual perusal of the source.  Complexity
analysis does not (directly) help improving an algorithm.  With the
possible exception of the idea of merging common prefixes, most of
this is not useful (at this stage) to improving grep.

If I was going to propose replacing the existing GNU grep, I would
(and always would have) done considerable more speed trials than the
simple one in my last message.


 It would seem that GNU grep is superior in the case of partial
 matches without a full match too, but the standard deviation for the

That is almost certainly something inside the regex library, which I
have repeatedly said I'm not interested in even looking at.  If our
regex library is too slow, then we need to look into the newer one the
Henry Spencer is rumoured to be sitting on.


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Tim Vanderhoek

On Fri, Jul 30, 1999 at 03:27:20PM +0200, Dag-Erling Smorgrav wrote:

  it was VERY simple to do... and attached is the patch... this uses the
  option REG_STARTEND to do what the copy was trying to do... all of the
  code to use REG_STARTEND was already there, it just needed to be enabled..
 
 Funnily, I experience a near-doubling of running time with similar
 patches.

Strange...  His patches made grep on my system much faster than the
original 0.10 and almost as fast as GNU grep.

b$ /usr/bin/time ./grep-10 -e printer longfile  /dev/null
1.16 real 0.97 user 0.19 sys
b$ /usr/bin/time ./grep-10-jmg -e printer longfile  /dev/null
0.48 real 0.43 user 0.04 sys
b$ /usr/bin/time grep -e printer longfile  /dev/null
0.28 real 0.09 user 0.18 sys

This is one of the original Celerons, FWIW.  Once-in-a-while that gives
me performance numbers somewhat different from any other Intel.


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Tim Vanderhoek

On Fri, Jul 30, 1999 at 03:27:20PM +0200, Dag-Erling Smorgrav wrote:
 
 Funnily, I experience a near-doubling of running time with similar
 patches.

Incidentally, it seems that it's not possible to assume that our
regex library is even anywhere in the same league as the GNU regex
library.

b$ time ./grep -E '(vt100)|(printer)' longfile  /dev/null

real0m21.284s
user0m22.034s
sys 0m0.083s

Now, with a profiled executable with optimization turned off it
takes about 25 seconds.  Regardless, it appears to spend 98% of
its time in regexec(), which is good, since that's where it should
be spending time.

[I had been intending to combine multiple patterns, ultimately
 combining in a '\n' to avoid the memchr() in mmopen].

b$ time grep '(vt100)|(printer)' longfile  /dev/null

real0m0.267s
user0m0.109s
sys 0m0.157s

98% * 20 = ~19...  Without an improved regex library, any mildly
complicated pattern will bring the new grep to its knees.

This could be the dfa helping GNU grep more than having a better
regexp library...  Probably both.

I wonder how well the devel/pcre port would do POSIX regular expressions.


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Dag-Erling Smorgrav
Tim Vanderhoek vand...@ecf.utoronto.ca writes:
 I do.  Still far too slow.  I'll work on this tomorrow, since that
 seems the only way to convince people that mmap is not such a big
 win.  :-(

mmap() gives a fourfold speed increase. I call that a big win.

I have a few other ideas which will make 0.11 even faster.

DES
-- 
Dag-Erling Smorgrav - d...@flood.ping.uio.no


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Dag-Erling Smorgrav
James Howard howar...@wam.umd.edu writes:
 DES tells me he has a new version (0.10) which mmap()s.  It supposedly
 cuts the run time down significantly, I do not have the numbers in front
 of me.  Unfortunetly he has not posted this version yet so I cannot
 download it and run it myself.

It's in the usual place (ftp://ftp.ofug.org/pub/grep/).

 He also says that if mmap fails, he drops
 back to stdio.  This should only happen in the NFS case, the  2G case,
 etc.

Any case in which a) the file is too large to mmap, b) the file is not
a regular file, or c) mmap() fails (e.g. NFS).

DES
-- 
Dag-Erling Smorgrav - d...@flood.ping.uio.no


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Dag-Erling Smorgrav
John-Mark Gurney gurne...@efn.org writes:
 it was VERY simple to do... and attached is the patch... this uses the
 option REG_STARTEND to do what the copy was trying to do... all of the
 code to use REG_STARTEND was already there, it just needed to be enabled..

Funnily, I experience a near-doubling of running time with similar
patches.

DES
-- 
Dag-Erling Smorgrav - d...@flood.ping.uio.no


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Dag-Erling Smorgrav
Dag-Erling Smorgrav d...@flood.ping.uio.no writes:
 John-Mark Gurney gurne...@efn.org writes:
  it was VERY simple to do... and attached is the patch... this uses the
  option REG_STARTEND to do what the copy was trying to do... all of the
  code to use REG_STARTEND was already there, it just needed to be enabled..
 Funnily, I experience a near-doubling of running time with similar
 patches.

To be precise, I experience a 30% decrease in system time and a 100%
increase in user time when I use RE_STARTEND and eliminate the
malloc() / memcpy() calls in procfile().

DES
-- 
Dag-Erling Smorgrav - d...@flood.ping.uio.no


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Daniel C. Sobral
John-Mark Gurney wrote:
 
 ok, I just made a patch to eliminate the copy that was happening in
 procfile, and it sped up a grep of a 5meg termcap from about 2.9sec
 down to .6 seconds... this includes time spent profiling the program..
 GNU grep w/o profiling only takes .15sec so we ARE getting closer to
 GNU grep...

Rather impressive. But... did you run these tests more than once, to
account for vm caching?

 it was VERY simple to do... and attached is the patch... this uses the
 option REG_STARTEND to do what the copy was trying to do... all of the
 code to use REG_STARTEND was already there, it just needed to be enabled..

Just for the record... :-) This eliminates one of the added
complexities I pointed out. 

--
Daniel C. Sobral(8-DCS)
d...@newsguy.com
d...@freebsd.org

Is it true that you're a millionaire's son who never worked a day
in your life?
Yeah, I guess so.
Lemme tell you, son, you ain't missed a thing.




To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Daniel C. Sobral
Tim Vanderhoek wrote:
 
 I'm sorry.  I've read your message and have decided that you're wrong.

Not that you did bother to counter the points I made. You only
comment on the one thing I said was probably insignificant. Are you
taking your clues from me? :-)

 Outside of the regexp library, algorithmic complexity is not a factor
 here.  It would take a beanbag to write anything other than an O(N)
 algorithm.

I said that I did not care whether the thing is inside or outside
the regexp library. And a N*search+N*copy, as opposed to N*search,
*is* relevant. And that N*copy is outside regexp.

And, just for the reference, GNU Grep uses a dfa to identify likely
matches before letting gnuregexp work.

 The proposed grep is slow, very slow, and I've sent a long message to
 James outlining how to make it much faster, but algorithmic complexity
 is not an issue.

So you say without having checked.

   The test you suggested doesn't show anything about that algorithmic
   complexity, though.
 
  Yeah? Try to back that with the results of the tests I suggested.
 
 No, it's not even worth my time.
 
 Now look.  You've gotten me so upset I actually went and did a simple
 test.  The test showed I'm right and you're wrong.  Catting X number
 of copies of /etc/termcap into longfile causes the time grep uses
 to pass longfile searching for all occurrences of printer causes
 it to use an extra 0.03 seconds for every repetition of /etc/termcap
 in longfile.
 
 Gee, linear complexity wrt to file length.  Who could've guessed!?

That does not *begin* to cover the cases I outlined.

 What'ya bet GNU grep also exhibits linear complexity?  :)
 
 Admit it, you jumped in with some bullshit about complexity when had
 you taken the time to look into what James meant when he said it now
 spends 50% of its time in procline() you would have kept quiet,
 realizing that he was talking about a constant factor in the
 complexity analysis, an subject where comments such as it now spends
 50% of its time in procline() are relevent.

Ok, here is the _DATA_ backing my bullshit.

First table: searching for non-existent patterns

Tests:
1) grep -e 123 world.build
2) grep -e 123456 world.build
3) grep -e 123 124 world.build
4) grep -e 123 456 world.build

These were made with GNU grep, the version 0.9 of the new grep, and
that version with the patch I sent previously (this later was
non-intended -- only after completing the test I realized the
executable was the one with my patches).

Each test was repeated five times after both the executable and the
target file were cached. I show here the averages of the line real
for time. The user and sys values were actually more interesting,
but with much greater deviation. :-)

GNU grepNew grepPatched grep
1)  0.09945s0.4460s 0.3870s
2)  0.07225s0.4424s 0.3894s
3)  0.12200s0.6352s 0.5814s
4)  0.18240s0.6364s 0.5796s

One can clearly see that GNU grep has a much better complexity in
the cases of longer patterns or multiple patterns with common
prefix.

It also shows that the new grep spends a lot of time in an activity
not related to the search itself, since it does multiple patterns by
calling regexec() multiple times, but 2:1 is not the proportion you
see up there. Also, the patch I introduced to eliminate
N*(malloc()+free()), N being the number of lines searched,
significantly reduces that overhead (overhead as in, *beyond* the
time spent in regexec()).

Second table: searching for existing patterns

Tests:
1) grep -e net world.build /dev/null
2) grep -e netipx world.build /dev/null
3) grep -e netinet world.build /dev/null
4) grep -e netinet -e netipx world.build /dev/null

GNU grepNew grep
1)  0.10750s0.57060s
2)  0.07575s0.46375s
3)  0.07416s0.46700s
4)  0.09950s0.67440s

Though these tests involve more factors because each has a different
number of matches, it again shows very clearly that the new grep has
increased complexity in the case of multiple patterns. See there,
cases 1 and 4. The latter has *less* matches than the former.

Third table: non-existing pattern on different sized files

Tests:
1) grep 123 world.build
2) grep 123 world.build.2 (two times world.build)
3) grep 123 world.build.3 (three times world.build)
4) grep 123 world.build.4 (four times world.build)

GNU grepNew grep
1)  0.09600s0.44750s
2)  0.16425s0.89075s
3)  0.24760s1.30850s
4)  0.31833s1.75900s

Linear, it would seem... but, alas, this is to be expected. Grep
searches inside lines, and the above does not increase the size of a
line, only the number of them. Still, it's a relief that the new
grep does not have a worse performance in this most simple test.

Fourth table: non-existing patterns on files with different line
sizes.

Tests:
1) grep abc line10
2) grep abc line20
3) grep 124 line10
4) 

Re: replacing grep(1)

1999-07-30 Thread Daniel C. Sobral
Dag-Erling Smorgrav wrote:
 
 To be precise, I experience a 30% decrease in system time and a 100%
 increase in user time when I use RE_STARTEND and eliminate the
 malloc() / memcpy() calls in procfile().

Could you please test my patch that removes malloc() but bot
memcpy()? Here it is again, though against an old version:

--- util.c.orig Thu Jul 29 19:14:17 1999
+++ util.c  Thu Jul 29 20:49:16 1999
@@ -107,6 +107,8 @@
 
ln.file = fn;
ln.line_no = 0;
+   ln.bufsize = 81; /* Magical constants, yeah! */
+   ln.dat = grep_malloc(81);
linesqueued = 0;
 
if (Bflag  0)
@@ -115,11 +117,14 @@
ln.off = grep_tell();
if ((tmp = grep_getln(ln.len)) == NULL)
break;
-   ln.dat = grep_malloc(ln.len + 1);
+   if (ln.bufsize  ln.len + 1)
+   ln.dat = grep_realloc(ln.dat, ln.len + 1);
memcpy(ln.dat, tmp, ln.len);
-   ln.dat[ln.len] = 0;
if (ln.len  0  ln.dat[ln.len - 1] == '\n')
ln.dat[--ln.len] = 0;
+   else
+   ln.dat[ln.len] = 0;
+
ln.line_no++;
 
z = tail;
@@ -127,9 +132,9 @@
enqueue(ln);
linesqueued++;
}
-   free(ln.dat);
c += t;
}
+   free(ln.dat);
if (Bflag  0)
clearqueue();
grep_close();
--- grep.h.orig Thu Jul 29 20:47:52 1999
+++ grep.h  Thu Jul 29 20:48:34 1999
@@ -35,6 +35,7 @@
 
 typedef struct {
size_t   len;
+   size_t   bufsize;
int  line_no;
int  off;
char*file;


--
Daniel C. Sobral(8-DCS)
d...@newsguy.com
d...@freebsd.org

Is it true that you're a millionaire's son who never worked a day
in your life?
Yeah, I guess so.
Lemme tell you, son, you ain't missed a thing.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Dag-Erling Smorgrav
Daniel C. Sobral d...@newsguy.com writes:
 Dag-Erling Smorgrav wrote:
  To be precise, I experience a 30% decrease in system time and a 100%
  increase in user time when I use RE_STARTEND and eliminate the
  malloc() / memcpy() calls in procfile().
 Could you please test my patch that removes malloc() but bot
 memcpy()? Here it is again, though against an old version:

Yeah. You can do even better by declaring ln static and never
free()ing it.

DES
-- 
Dag-Erling Smorgrav - d...@flood.ping.uio.no


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Dag-Erling Smorgrav
Daniel C. Sobral d...@newsguy.com writes:
 Could you please test my patch that removes malloc() but bot
 memcpy()? Here it is again, though against an old version:

Bingo. REG_STARTEND is significantly more expensive than memcpy().

DES
-- 
Dag-Erling Smorgrav - d...@flood.ping.uio.no


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-30 Thread John-Mark Gurney
Daniel C. Sobral scribbled this message on Jul 30:
 Dag-Erling Smorgrav wrote:
  
  To be precise, I experience a 30% decrease in system time and a 100%
  increase in user time when I use RE_STARTEND and eliminate the
  malloc() / memcpy() calls in procfile().
 
 Could you please test my patch that removes malloc() but bot
 memcpy()? Here it is again, though against an old version:

wierd, I was running your patch, and at first I would get from .69 up
to 1.03 seconds run time, but I can't seem to generate that problem
right now...  w/ your patches I'm getting around .67 to .7 seconds for:
time ./grep THIS /tmp/ports/freegrep/work/grep-0.10/termcap.long   /dev/null
0.68 real 0.63 user 0.03 sys
0.67 real 0.65 user 0.01 sys
0.67 real 0.63 user 0.03 sys
0.67 real 0.63 user 0.03 sys
0.67 real 0.66 user 0.00 sys
0.67 real 0.64 user 0.02 sys

summary of gprof output:
[3] 50.10.020.21  108213 procline [3]
[4] 46.70.020.19  108213 regexec [4]
[7] 28.50.130.00  108214 mmfgetln [7]
[10] 4.80.000.022393 grep_realloc [10]

with my patch and the exact same command, I get .58 to .59 seconds...
0.58 real 0.54 user 0.03 sys
0.58 real 0.53 user 0.04 sys
0.58 real 0.55 user 0.02 sys
0.58 real 0.57 user 0.00 sys
0.59 real 0.55 user 0.02 sys
0.58 real 0.55 user 0.02 sys

summary of gprof output:
[3] 57.10.040.19  108213 procline [3]
[4] 48.00.020.17  108213 regexec [4]
[7] 34.10.130.00  108214 mmfgetln [7]
[10] 2.00.010.00   1 _munmap [10]

(I include _munmap because realloc/malloc/free are in the 0.0% on my
patch)

and grep 0.10 w/o patches:
2.82 real 1.63 user 1.12 sys
2.79 real 1.53 user 1.20 sys
2.80 real 1.65 user 1.08 sys
2.84 real 1.67 user 1.10 sys
2.82 real 1.67 user 1.08 sys
2.91 real 1.66 user 1.14 sys

summary of gprof output:
[5] 55.11.120.00   74985 _madvise [5]
[7] 13.30.040.23  108213 regexec [7]
[9]  8.40.000.17  108217 grep_malloc [9]
[13] 6.50.130.00  108214 mmfgetln [13]

all of the programs were compiled w/ the exact same options... that is
I added -g -pg to the CFLAGS in the Makefile to generate profiling info..

I'm not sure about you, but on my k6/200, the STARTEND is more efficient
than the memcpy/realloc, and to tell you the truth, I can't see why it'd
be more effecient to copy possible multiple kilobytes of data than to just
use indexes instead of modifing a ptr...

right now, I'm trying to think of a way to eliminate the fgetln searching
for end of line... of course this would eliminate some of the simplicity
of design, but we can get a BIG speed increase if we simply don't scan for
the new line unless we NEED to...  and if we do, why not use regexec to
search for us?

-- 
  John-Mark Gurney  Voice: +1 541 684 8449
  Cu Networking   P.O. Box 5693, 97405

  The soul contains in itself the event that shall presently befall it.
  The event is only the actualizing of its thought. -- Ralph Waldo Emerson


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Tim Vanderhoek
On Fri, Jul 30, 1999 at 10:56:55PM +0900, Daniel C. Sobral wrote:
 
 I said that I did not care whether the thing is inside or outside
 the regexp library.

Yes, although I think at this point it's obvious we're coming at this
discussion from fairly different perspectives.  By the time you
brought-up complexity originally, I had more or less decided that I
did not want to see the new grep imported without significant speed
improvements and was concerned with how to improve grep.  Your
interest is in debating that point (fortunately arguing for the
side I agree with :).


 4) grep -e 123 456 world.build

[I assume grep -e 123 -e 124 world.build]

 One can clearly see that GNU grep has a much better complexity in
 the cases of longer patterns or multiple patterns with common
 prefix.

Alright, someone else already mentioned to me in email that I
totally ignored what differences involved multiple patterns.
Combining multiple patterns is a big win if those two patterns have
a common prefix (I hadn't considered the case of similar patterns
before, actually).  Combining multiple patterns when they're
dissimilar doesn't appear to help much (which is the only case I had
considered -- my mistake, and also the reason I ignored what you
said about multiple patterns).

I'm surprised by the way GNU grep is able to handle longer patterns,
and I probably wouldn't have noticed it unless I'd taken some time to
examine the GNU source.

Congratulations, you win.  :)  The rest of your lengthy message mostly
goes on to repeat the fact that GNU grep is able to merge multiple
patterns with a common prefix (and postfix?) to good effect.


 It also shows that the new grep spends a lot of time in an activity
 not related to the search itself, since it does multiple patterns by

Well, duh.  This is really why my reaction to complexity analysis is
(still) what it is.  Complexity analysis is almost only useful for
comparing two different algorithms and the fact that the new grep
spends a lot of time doing things other than pattern searching is
quite obvious after a casual perusal of the source.  Complexity
analysis does not (directly) help improving an algorithm.  With the
possible exception of the idea of merging common prefixes, most of
this is not useful (at this stage) to improving grep.

If I was going to propose replacing the existing GNU grep, I would
(and always would have) done considerable more speed trials than the
simple one in my last message.


 It would seem that GNU grep is superior in the case of partial
 matches without a full match too, but the standard deviation for the

That is almost certainly something inside the regex library, which I
have repeatedly said I'm not interested in even looking at.  If our
regex library is too slow, then we need to look into the newer one the
Henry Spencer is rumoured to be sitting on.


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Tim Vanderhoek
On Fri, Jul 30, 1999 at 03:27:20PM +0200, Dag-Erling Smorgrav wrote:

  it was VERY simple to do... and attached is the patch... this uses the
  option REG_STARTEND to do what the copy was trying to do... all of the
  code to use REG_STARTEND was already there, it just needed to be enabled..
 
 Funnily, I experience a near-doubling of running time with similar
 patches.

Strange...  His patches made grep on my system much faster than the
original 0.10 and almost as fast as GNU grep.

b$ /usr/bin/time ./grep-10 -e printer longfile  /dev/null
1.16 real 0.97 user 0.19 sys
b$ /usr/bin/time ./grep-10-jmg -e printer longfile  /dev/null
0.48 real 0.43 user 0.04 sys
b$ /usr/bin/time grep -e printer longfile  /dev/null
0.28 real 0.09 user 0.18 sys

This is one of the original Celerons, FWIW.  Once-in-a-while that gives
me performance numbers somewhat different from any other Intel.


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-30 Thread Tim Vanderhoek
On Fri, Jul 30, 1999 at 03:27:20PM +0200, Dag-Erling Smorgrav wrote:
 
 Funnily, I experience a near-doubling of running time with similar
 patches.

Incidentally, it seems that it's not possible to assume that our
regex library is even anywhere in the same league as the GNU regex
library.

b$ time ./grep -E '(vt100)|(printer)' longfile  /dev/null

real0m21.284s
user0m22.034s
sys 0m0.083s

Now, with a profiled executable with optimization turned off it
takes about 25 seconds.  Regardless, it appears to spend 98% of
its time in regexec(), which is good, since that's where it should
be spending time.

[I had been intending to combine multiple patterns, ultimately
 combining in a '\n' to avoid the memchr() in mmopen].

b$ time grep '(vt100)|(printer)' longfile  /dev/null

real0m0.267s
user0m0.109s
sys 0m0.157s

98% * 20 = ~19...  Without an improved regex library, any mildly
complicated pattern will bring the new grep to its knees.

This could be the dfa helping GNU grep more than having a better
regexp library...  Probably both.

I wonder how well the devel/pcre port would do POSIX regular expressions.


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-29 Thread Tim Vanderhoek

On Thu, Jul 29, 1999 at 09:16:53PM +0900, Daniel C. Sobral wrote:
  
   Sorry, but a simplistic analysis like that just won't cut for grep.
   The algorithmic complexity is highly relevant here. Try this:
  
  Algorithmic complexity!?!
 
 Yup.

I'm sorry.  I've read your message and have decided that you're wrong.
Outside of the regexp library, algorithmic complexity is not a factor
here.  It would take a beanbag to write anything other than an O(N)
algorithm.

The proposed grep is slow, very slow, and I've sent a long message to
James outlining how to make it much faster, but algorithmic complexity
is not an issue.


 Also, fgetln() will copy the line buffer from time to time, though
 that's not a simple computation, and probably of little

fgetln() does a complete copy of the line buffer whenever an
excessively long line is found.  On this point, it's hard to do better
without using mmap(), but mmap() has its own disadvantages.  My last
suggestion to James was to assume a worst case for long lines and mark
the worst worst case with an XXX "this is unfortunate".


  The test you suggested doesn't show anything about that algorithmic
  complexity, though.
 
 Yeah? Try to back that with the results of the tests I suggested.

No, it's not even worth my time.

Now look.  You've gotten me so upset I actually went and did a simple
test.  The test showed I'm right and you're wrong.  Catting X number
of copies of /etc/termcap into longfile causes the time grep uses
to pass longfile searching for all occurrences of "printer" causes
it to use an extra 0.03 seconds for every repetition of /etc/termcap
in longfile.

Gee, linear complexity wrt to file length.  Who could've guessed!?

What'ya bet GNU grep also exhibits linear complexity?  :)

Admit it, you jumped in with some bullshit about complexity when had
you taken the time to look into what James meant when he said "it now
spends 50% of its time in procline()" you would have kept quiet,
realizing that he was talking about a constant factor in the
complexity analysis, an subject where comments such as "it now spends
50% of its time in procline()" are relevent.

:-)

[Never mind that it should be spending near 100% of its time in
 procline...that just means he's still got work to do... :-]


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-29 Thread James Howard

On Thu, 29 Jul 1999, Tim Vanderhoek wrote:

 fgetln() does a complete copy of the line buffer whenever an
 excessively long line is found.  On this point, it's hard to do better
 without using mmap(), but mmap() has its own disadvantages.  My last
 suggestion to James was to assume a worst case for long lines and mark
 the worst worst case with an XXX "this is unfortunate".

warning type="Anything said here wrong is my fault, not DES's"

DES tells me he has a new version (0.10) which mmap()s.  It supposedly
cuts the run time down significantly, I do not have the numbers in front
of me.  Unfortunetly he has not posted this version yet so I cannot
download it and run it myself.  He also says that if mmap fails, he drops
back to stdio.  This should only happen in the NFS case, the  2G case,
etc.

/warning

 [Never mind that it should be spending near 100% of its time in
  procline...that just means he's still got work to do... :-]

I'd rather see it spending 100% of its time in regexec(), then I can just
blame Henry Spencer :)

Someone said there was new regex code out, is this true?  Can anyone with
a copy test grep with it?

Jamie



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-29 Thread Tim Vanderhoek

On Thu, Jul 29, 1999 at 07:05:57PM -0400, James Howard wrote:
 
 warning type="Anything said here wrong is my fault, not DES's"
 
 DES tells me he has a new version (0.10) which mmap()s.  It supposedly
 cuts the run time down significantly, I do not have the numbers in front
 of me.

I do.  Still far too slow.  I'll work on this tomorrow, since that
seems the only way to convince people that mmap is not such a big
win.  :-(

Hmm...  Maybe I'll even turn-out to be wrong.  ;-)  I really believe
mmap falls into the category of "might be nice, but not necessary and
does complicate things..."


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-29 Thread Matthew Dillon

:of me.  Unfortunetly he has not posted this version yet so I cannot
:download it and run it myself.  He also says that if mmap fails, he drops
:back to stdio.  This should only happen in the NFS case, the  2G case,
:etc.

It should only be the  2G case or the pipe case.  mmap() works just fine
over NFS.

I would not expect a huge speed increase using mmap over read.  mmap()
tends to be a lot harder on the system then read() (though we are working 
on that), especially if you are scanning large files.

Avoiding buffer copies is good, but keep in mind that the cost of accessing
a location in memory is essentially 0 if the memory is already in the L1
cache.  So while you may get an improvement going from read() to mmap(),
which avoids large buffer copies, you will not see much of an improvement
removing redundancy from the line scan.

-Matt
Matthew Dillon 
[EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-29 Thread John-Mark Gurney

James Howard scribbled this message on Jul 29:
 On Thu, 29 Jul 1999, Tim Vanderhoek wrote:
 
  fgetln() does a complete copy of the line buffer whenever an
  excessively long line is found.  On this point, it's hard to do better
  without using mmap(), but mmap() has its own disadvantages.  My last
  suggestion to James was to assume a worst case for long lines and mark
  the worst worst case with an XXX "this is unfortunate".
 
 warning type="Anything said here wrong is my fault, not DES's"
 
 DES tells me he has a new version (0.10) which mmap()s.  It supposedly
 cuts the run time down significantly, I do not have the numbers in front
 of me.  Unfortunetly he has not posted this version yet so I cannot
 download it and run it myself.  He also says that if mmap fails, he drops
 back to stdio.  This should only happen in the NFS case, the  2G case,
 etc.
 
 /warning
 
  [Never mind that it should be spending near 100% of its time in
   procline...that just means he's still got work to do... :-]
 
 I'd rather see it spending 100% of its time in regexec(), then I can just
 blame Henry Spencer :)
 
 Someone said there was new regex code out, is this true?  Can anyone with
 a copy test grep with it?

ok, I just made a patch to eliminate the copy that was happening in
procfile, and it sped up a grep of a 5meg termcap from about 2.9sec
down to .6 seconds... this includes time spent profiling the program..
GNU grep w/o profiling only takes .15sec so we ARE getting closer to
GNU grep...

it was VERY simple to do... and attached is the patch... this uses the
option REG_STARTEND to do what the copy was trying to do... all of the
code to use REG_STARTEND was already there, it just needed to be enabled..

enjoy!

-- 
  John-Mark Gurney  Voice: +1 541 684 8449
  Cu Networking   P.O. Box 5693, 97405

  "The soul contains in itself the event that shall presently befall it.
  The event is only the actualizing of its thought." -- Ralph Waldo Emerson


diff -u grep-0.10.orig/util.c grep-0.10/util.c
--- grep-0.10.orig/util.c   Thu Jul 29 05:00:15 1999
+++ grep-0.10/util.cThu Jul 29 16:38:06 1999
@@ -93,7 +93,6 @@
file_t *f;
str_t ln;
int c, t, z;
-   char *tmp;
 
if (fn == NULL) {
fn = "(standard input)";
@@ -119,13 +118,8 @@
initqueue();
for (c = 0; !(lflag  c);) {
ln.off = grep_tell(f);
-   if ((tmp = grep_fgetln(f, ln.len)) == NULL)
+   if ((ln.dat = grep_fgetln(f, ln.len)) == NULL)
break;
-   ln.dat = grep_malloc(ln.len + 1);
-   memcpy(ln.dat, tmp, ln.len);
-   ln.dat[ln.len] = 0;
-   if (ln.len  0  ln.dat[ln.len - 1] == '\n')
-   ln.dat[--ln.len] = 0;
ln.line_no++;
 
z = tail;
@@ -133,7 +127,6 @@
enqueue(ln);
linesqueued++;
}
-   free(ln.dat);
c += t;
}
if (Bflag  0)
@@ -174,7 +167,8 @@
pmatch.rm_so = 0;
pmatch.rm_eo = l-len;
for (c = i = 0; i  patterns; i++) {
-   r = regexec(r_pattern[i], l-dat, 0, pmatch, eflags);
+   r = regexec(r_pattern[i], l-dat, 0, pmatch,
+   eflags | REG_STARTEND);
if (r == REG_NOMATCH  t == 0)
continue;
if (wflag  r == 0) {



Re: replacing grep(1)

1999-07-29 Thread Daniel C. Sobral
Tim Vanderhoek wrote:
 
 On Thu, Jul 29, 1999 at 01:59:45AM +0900, Daniel C. Sobral wrote:
 
  Sorry, but a simplistic analysis like that just won't cut for grep.
  The algorithmic complexity is highly relevant here. Try this:
 
 Algorithmic complexity!?!

Yup.

 It's a freaking grep application.  There is no freaking algorithmic
 complexity.

Pattern matching is one of the prime examples of algorithmic
complexity.

You can add complexity very trivially.

 At least not outside of our regex library, anyways.  

I had not looked at the source, so I didn't know exactly how the
application did it's stuff.

Now I did, and I'll comment. Let's say the number of patterns is N,
and the total number of characters to be examined is S. Let's call
the unmodified complexity C, just for the sake of simplifying
comparision using a dangerous simplification.

First, the new grep uses fgetln(). fgetln() searches for a new line.
So each character is examined (at least) twice. That's C+S*read
already. GNU Grep uses mmap() (or read(), but not in FreeBSD), so it
doesn't incur in this additional complexity.

Also, fgetln() will copy the line buffer from time to time, though
that's not a simple computation, and probably of little
significance.

In addition to that, the new grep copies the fgrepln() result each
time. Add S*copy to C.

Next, the new grep tests the lines against each pattern separately!
GNU grep doesn't.

That's just *outside* the regexp library. Now, whether the
complexity is inside or outside the regexp library, I don't care.
It's complexity all the same. So it *must* be factored in.

 The test you
 suggested doesn't show anything about that algorithmic complexity,
 though.

Yeah? Try to back that with the results of the tests I suggested.

 If we have a slow regex library, though, I would consider that a
 separate problem from a slow grep.

If the f*cking grep is f*cking slow, I don't give a f*ck where the
problem is located! It just *IS*. GNU grep uses gnu regexp library,
the new grep uses our own. If changing greps means changing to a
library whose algorithm complexity is greater, then that *DOES*
count against the change.

For instance, a quick browse over GNU greps shows the gnu regexp
library can factor in multiple patterns. That is not being done by
the new grep. Does our regexp library support that?

Now, here is a quick and dirty fix for the repeated malloc()/free().
Notice that this is what fgetln() does, in fact. I'm afraid, though,
that's this is not anywhere near what would be needed by far to put
the new grep anywhere near the league of GNU grep.

I like the idea of a readable code, I like the idea of a BSD
license, but it would be damn silly to replace a clearly superior
grep, and that's where the thing stands right now.

--- util.c.orig Thu Jul 29 19:14:17 1999
+++ util.c  Thu Jul 29 20:49:16 1999
@@ -107,6 +107,8 @@
 
ln.file = fn;
ln.line_no = 0;
+   ln.bufsize = 81; /* Magical constants, yeah! */
+   ln.dat = grep_malloc(81);
linesqueued = 0;
 
if (Bflag  0)
@@ -115,11 +117,14 @@
ln.off = grep_tell();
if ((tmp = grep_getln(ln.len)) == NULL)
break;
-   ln.dat = grep_malloc(ln.len + 1);
+   if (ln.bufsize  ln.len + 1)
+   ln.dat = grep_realloc(ln.dat, ln.len + 1);
memcpy(ln.dat, tmp, ln.len);
-   ln.dat[ln.len] = 0;
if (ln.len  0  ln.dat[ln.len - 1] == '\n')
ln.dat[--ln.len] = 0;
+   else
+   ln.dat[ln.len] = 0;
+
ln.line_no++;
 
z = tail;
@@ -127,9 +132,9 @@
enqueue(ln);
linesqueued++;
}
-   free(ln.dat);
c += t;
}
+   free(ln.dat);
if (Bflag  0)
clearqueue();
grep_close();
--- grep.h.orig Thu Jul 29 20:47:52 1999
+++ grep.h  Thu Jul 29 20:48:34 1999
@@ -35,6 +35,7 @@
 
 typedef struct {
size_t   len;
+   size_t   bufsize;
int  line_no;
int  off;
char*file;


--
Daniel C. Sobral(8-DCS)
d...@newsguy.com
d...@freebsd.org

Is it true that you're a millionaire's son who never worked a day
in your life?
Yeah, I guess so.
Lemme tell you, son, you ain't missed a thing.



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-29 Thread Tim Vanderhoek
On Thu, Jul 29, 1999 at 09:16:53PM +0900, Daniel C. Sobral wrote:
  
   Sorry, but a simplistic analysis like that just won't cut for grep.
   The algorithmic complexity is highly relevant here. Try this:
  
  Algorithmic complexity!?!
 
 Yup.

I'm sorry.  I've read your message and have decided that you're wrong.
Outside of the regexp library, algorithmic complexity is not a factor
here.  It would take a beanbag to write anything other than an O(N)
algorithm.

The proposed grep is slow, very slow, and I've sent a long message to
James outlining how to make it much faster, but algorithmic complexity
is not an issue.


 Also, fgetln() will copy the line buffer from time to time, though
 that's not a simple computation, and probably of little

fgetln() does a complete copy of the line buffer whenever an
excessively long line is found.  On this point, it's hard to do better
without using mmap(), but mmap() has its own disadvantages.  My last
suggestion to James was to assume a worst case for long lines and mark
the worst worst case with an XXX this is unfortunate.


  The test you suggested doesn't show anything about that algorithmic
  complexity, though.
 
 Yeah? Try to back that with the results of the tests I suggested.

No, it's not even worth my time.

Now look.  You've gotten me so upset I actually went and did a simple
test.  The test showed I'm right and you're wrong.  Catting X number
of copies of /etc/termcap into longfile causes the time grep uses
to pass longfile searching for all occurrences of printer causes
it to use an extra 0.03 seconds for every repetition of /etc/termcap
in longfile.

Gee, linear complexity wrt to file length.  Who could've guessed!?

What'ya bet GNU grep also exhibits linear complexity?  :)

Admit it, you jumped in with some bullshit about complexity when had
you taken the time to look into what James meant when he said it now
spends 50% of its time in procline() you would have kept quiet,
realizing that he was talking about a constant factor in the
complexity analysis, an subject where comments such as it now spends
50% of its time in procline() are relevent.

:-)

[Never mind that it should be spending near 100% of its time in
 procline...that just means he's still got work to do... :-]


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-29 Thread James Howard
On Thu, 29 Jul 1999, Tim Vanderhoek wrote:

 fgetln() does a complete copy of the line buffer whenever an
 excessively long line is found.  On this point, it's hard to do better
 without using mmap(), but mmap() has its own disadvantages.  My last
 suggestion to James was to assume a worst case for long lines and mark
 the worst worst case with an XXX this is unfortunate.

warning type=Anything said here wrong is my fault, not DES's

DES tells me he has a new version (0.10) which mmap()s.  It supposedly
cuts the run time down significantly, I do not have the numbers in front
of me.  Unfortunetly he has not posted this version yet so I cannot
download it and run it myself.  He also says that if mmap fails, he drops
back to stdio.  This should only happen in the NFS case, the  2G case,
etc.

/warning

 [Never mind that it should be spending near 100% of its time in
  procline...that just means he's still got work to do... :-]

I'd rather see it spending 100% of its time in regexec(), then I can just
blame Henry Spencer :)

Someone said there was new regex code out, is this true?  Can anyone with
a copy test grep with it?

Jamie



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-29 Thread Tim Vanderhoek
On Thu, Jul 29, 1999 at 07:05:57PM -0400, James Howard wrote:
 
 warning type=Anything said here wrong is my fault, not DES's
 
 DES tells me he has a new version (0.10) which mmap()s.  It supposedly
 cuts the run time down significantly, I do not have the numbers in front
 of me.

I do.  Still far too slow.  I'll work on this tomorrow, since that
seems the only way to convince people that mmap is not such a big
win.  :-(

Hmm...  Maybe I'll even turn-out to be wrong.  ;-)  I really believe
mmap falls into the category of might be nice, but not necessary and
does complicate things...


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-29 Thread Matthew Dillon
:of me.  Unfortunetly he has not posted this version yet so I cannot
:download it and run it myself.  He also says that if mmap fails, he drops
:back to stdio.  This should only happen in the NFS case, the  2G case,
:etc.

It should only be the  2G case or the pipe case.  mmap() works just fine
over NFS.

I would not expect a huge speed increase using mmap over read.  mmap()
tends to be a lot harder on the system then read() (though we are working 
on that), especially if you are scanning large files.

Avoiding buffer copies is good, but keep in mind that the cost of accessing
a location in memory is essentially 0 if the memory is already in the L1
cache.  So while you may get an improvement going from read() to mmap(),
which avoids large buffer copies, you will not see much of an improvement
removing redundancy from the line scan.

-Matt
Matthew Dillon 
dil...@backplane.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-29 Thread John-Mark Gurney
James Howard scribbled this message on Jul 29:
 On Thu, 29 Jul 1999, Tim Vanderhoek wrote:
 
  fgetln() does a complete copy of the line buffer whenever an
  excessively long line is found.  On this point, it's hard to do better
  without using mmap(), but mmap() has its own disadvantages.  My last
  suggestion to James was to assume a worst case for long lines and mark
  the worst worst case with an XXX this is unfortunate.
 
 warning type=Anything said here wrong is my fault, not DES's
 
 DES tells me he has a new version (0.10) which mmap()s.  It supposedly
 cuts the run time down significantly, I do not have the numbers in front
 of me.  Unfortunetly he has not posted this version yet so I cannot
 download it and run it myself.  He also says that if mmap fails, he drops
 back to stdio.  This should only happen in the NFS case, the  2G case,
 etc.
 
 /warning
 
  [Never mind that it should be spending near 100% of its time in
   procline...that just means he's still got work to do... :-]
 
 I'd rather see it spending 100% of its time in regexec(), then I can just
 blame Henry Spencer :)
 
 Someone said there was new regex code out, is this true?  Can anyone with
 a copy test grep with it?

ok, I just made a patch to eliminate the copy that was happening in
procfile, and it sped up a grep of a 5meg termcap from about 2.9sec
down to .6 seconds... this includes time spent profiling the program..
GNU grep w/o profiling only takes .15sec so we ARE getting closer to
GNU grep...

it was VERY simple to do... and attached is the patch... this uses the
option REG_STARTEND to do what the copy was trying to do... all of the
code to use REG_STARTEND was already there, it just needed to be enabled..

enjoy!

-- 
  John-Mark Gurney  Voice: +1 541 684 8449
  Cu Networking   P.O. Box 5693, 97405

  The soul contains in itself the event that shall presently befall it.
  The event is only the actualizing of its thought. -- Ralph Waldo Emerson
diff -u grep-0.10.orig/util.c grep-0.10/util.c
--- grep-0.10.orig/util.c   Thu Jul 29 05:00:15 1999
+++ grep-0.10/util.cThu Jul 29 16:38:06 1999
@@ -93,7 +93,6 @@
file_t *f;
str_t ln;
int c, t, z;
-   char *tmp;
 
if (fn == NULL) {
fn = (standard input);
@@ -119,13 +118,8 @@
initqueue();
for (c = 0; !(lflag  c);) {
ln.off = grep_tell(f);
-   if ((tmp = grep_fgetln(f, ln.len)) == NULL)
+   if ((ln.dat = grep_fgetln(f, ln.len)) == NULL)
break;
-   ln.dat = grep_malloc(ln.len + 1);
-   memcpy(ln.dat, tmp, ln.len);
-   ln.dat[ln.len] = 0;
-   if (ln.len  0  ln.dat[ln.len - 1] == '\n')
-   ln.dat[--ln.len] = 0;
ln.line_no++;
 
z = tail;
@@ -133,7 +127,6 @@
enqueue(ln);
linesqueued++;
}
-   free(ln.dat);
c += t;
}
if (Bflag  0)
@@ -174,7 +167,8 @@
pmatch.rm_so = 0;
pmatch.rm_eo = l-len;
for (c = i = 0; i  patterns; i++) {
-   r = regexec(r_pattern[i], l-dat, 0, pmatch, eflags);
+   r = regexec(r_pattern[i], l-dat, 0, pmatch,
+   eflags | REG_STARTEND);
if (r == REG_NOMATCH  t == 0)
continue;
if (wflag  r == 0) {


Re: replacing grep(1)

1999-07-29 Thread John-Mark Gurney
Tim Vanderhoek scribbled this message on Jul 29:
 On Thu, Jul 29, 1999 at 07:05:57PM -0400, James Howard wrote:
  
  warning type=Anything said here wrong is my fault, not DES's
  
  DES tells me he has a new version (0.10) which mmap()s.  It supposedly
  cuts the run time down significantly, I do not have the numbers in front
  of me.
 
 I do.  Still far too slow.  I'll work on this tomorrow, since that
 seems the only way to convince people that mmap is not such a big
 win.  :-(

I just managed to get a five time speed increase by removing an
uncessary copy...   and now, grep spends 50% of it's time in regexc,
37.2% of it's time in mmfgetln, and this is because of the scanning for a
new line character...

 Hmm...  Maybe I'll even turn-out to be wrong.  ;-)  I really believe
 mmap falls into the category of might be nice, but not necessary and
 does complicate things...

I think it is a big win...  it shaved off around a half second from
3 seconds down to 2 and a half seconds...

-- 
  John-Mark Gurney  Voice: +1 541 684 8449
  Cu Networking   P.O. Box 5693, 97405

  The soul contains in itself the event that shall presently befall it.
  The event is only the actualizing of its thought. -- Ralph Waldo Emerson


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-28 Thread Dag-Erling Smorgrav

Sheldon Hearn [EMAIL PROTECTED] writes:
 In this case, the implementation we'll be introducing will introduce a
 performance loss, not a gain.

Can you document that?

   As far as stability goes, there's a loss
 involved _if_ passing the GNU grep regression tests is important.

Do you mean that Jamie's implementation doesn't pass those regression
tests? If they don't, we can fix it before importing it into the tree.

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-28 Thread Tim Vanderhoek

On Wed, Jul 28, 1999 at 03:30:58AM -0400, Dag-Erling Smorgrav wrote:
 
  There seems to be at least one dependency on GNU grep in
  /ports/Mk/bsd.port.mk where the -F argument is used.
 
 -F is implemented.

I saw that, but had assumed the semantics were different.  I should
have read the read the manpages more closely: they're not.  Sorry.


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-28 Thread Warner Losh

In message [EMAIL PROTECTED] "David O'Brien" writes:
: Before importing, it must display a version number of 1.0 (or drop the
: version number).  This is not Linux where everything is version 0.xy.

For a long time the new boot loader was in the tree with a version
0.xx...

Warner


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-28 Thread Mark Dickey

I expect that there is a very good reason why this shouldn't be done,
but could it be possible to implement two different algorithms/code
dependant on the size of the file being grepped?

Mark Dickey
[EMAIL PROTECTED]

Daniel C. Sobral wrote:
 James Howard wrote:
 
  Due to the discussion of speed, I have been looking at it and it is
really
  slow.  Even slower than I thought and I was thinking it was pretty slow.
 
  So using gprof, I have discovered that it seems to spend a whole mess of
  time in grep_malloc() and free().  So I pulled all the references to
  malloc inside the main loop (the copy for ln.dat and removed queueing).
  This stills leaves us with a grep that is about ~6x slower than GNU.
  Before that, it ran closer to 80x.  After this, gprof says it spends
  around 53% of its time in procline().

 Sorry, but a simplistic analysis like that just won't cut for grep.
 The algorithmic complexity is highly relevant here. Try this:
 generate a 1 Mb file, and then generate 10 Mb and 50 Mb files by
 concatenating that first file. Benchmark yours and GNU grep a number
 of times to get the average for each file. Now compare the
 *proportions* between the different sized files. Are they the same?

 Next, try different sized patterns on the 50 Mb file on both yours
 and GNU grep. Again, compare the proportion.

 Next, compare patterns with different number of "wildcards",
 patterns with things like [acegikmoqsuvxz] vs
 [acegikmoqsuvxzACEGIKMOQSUVXZ], etc.

 Either that, or do a complexity analysis of the algorithms. :-)

 (In case anyone reading this discussion wants to know more about
 complexity of algorithms, I recommend Computer Algorithms,
 Introduction to Design and Analysis, by Sara Baase, Addison Wesley.)

 --
 Daniel C. Sobral (8-DCS)
 [EMAIL PROTECTED]
 [EMAIL PROTECTED]

 "Is it true that you're a millionaire's son who never worked a day
 in your life?"
 "Yeah, I guess so."
 "Lemme tell you, son, you ain't missed a thing."



 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-hackers" in the body of the message




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-28 Thread Peter Jeremy

Doug [EMAIL PROTECTED] wrote:
 The more complete the feature set, the better
off we are for my money.
Someone offering money?  Quick, who's got the donations hat... :-)

Peter


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-28 Thread Tim Vanderhoek

On Thu, Jul 29, 1999 at 01:59:45AM +0900, Daniel C. Sobral wrote:
 
 Sorry, but a simplistic analysis like that just won't cut for grep.
 The algorithmic complexity is highly relevant here. Try this:

Algorithmic complexity!?!

It's a freaking grep application.  There is no freaking algorithmic
complexity.

At least not outside of our regex library, anyways.  The test you
suggested doesn't show anything about that algorithmic complexity,
though.

If we have a slow regex library, though, I would consider that a
separate problem from a slow grep.


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-28 Thread Dag-Erling Smorgrav
Brian F. Feldman gr...@freebsd.org writes:
 That's true. I'd like to see the replacement grep do mmaping of the
 input files if it doesn't already, as that would speed it up.

Shouldn't be too hard to implement, the way file operations are
abstracted. Patches? :)

DES
-- 
Dag-Erling Smorgrav - d...@yes.no


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-28 Thread Dag-Erling Smorgrav
Sheldon Hearn sheld...@uunet.co.za writes:
 In this case, the implementation we'll be introducing will introduce a
 performance loss, not a gain.

Can you document that?

   As far as stability goes, there's a loss
 involved _if_ passing the GNU grep regression tests is important.

Do you mean that Jamie's implementation doesn't pass those regression
tests? If they don't, we can fix it before importing it into the tree.

DES
-- 
Dag-Erling Smorgrav - d...@yes.no


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-28 Thread Dag-Erling Smorgrav
Tim Vanderhoek vand...@ecf.utoronto.ca writes:
 Have you run your systems with J-grep as a replacement for GNU grep
 for a while (making sure nothing breaks)?

Yes.

 There seems to be at least one dependency on GNU grep in
 /ports/Mk/bsd.port.mk where the -F argument is used.

-F is implemented.

DES
-- 
Dag-Erling Smorgrav - d...@yes.no


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-28 Thread Tim Vanderhoek
On Wed, Jul 28, 1999 at 03:30:58AM -0400, Dag-Erling Smorgrav wrote:
 
  There seems to be at least one dependency on GNU grep in
  /ports/Mk/bsd.port.mk where the -F argument is used.
 
 -F is implemented.

I saw that, but had assumed the semantics were different.  I should
have read the read the manpages more closely: they're not.  Sorry.


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-28 Thread Warner Losh
In message 19990727214451.a66...@dragon.nuxi.com David O'Brien writes:
: Before importing, it must display a version number of 1.0 (or drop the
: version number).  This is not Linux where everything is version 0.xy.

For a long time the new boot loader was in the tree with a version
0.xx...

Warner


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-28 Thread Daniel C. Sobral
James Howard wrote:
 
 Due to the discussion of speed, I have been looking at it and it is really
 slow.  Even slower than I thought and I was thinking it was pretty slow.
 
 So using gprof, I have discovered that it seems to spend a whole mess of
 time in grep_malloc() and free().  So I pulled all the references to
 malloc inside the main loop (the copy for ln.dat and removed queueing).
 This stills leaves us with a grep that is about ~6x slower than GNU.
 Before that, it ran closer to 80x.  After this, gprof says it spends
 around 53% of its time in procline().

Sorry, but a simplistic analysis like that just won't cut for grep.
The algorithmic complexity is highly relevant here. Try this:
generate a 1 Mb file, and then generate 10 Mb and 50 Mb files by
concatenating that first file. Benchmark yours and GNU grep a number
of times to get the average for each file. Now compare the
*proportions* between the different sized files. Are they the same?

Next, try different sized patterns on the 50 Mb file on both yours
and GNU grep. Again, compare the proportion.

Next, compare patterns with different number of wildcards,
patterns with things like [acegikmoqsuvxz] vs
[acegikmoqsuvxzACEGIKMOQSUVXZ], etc.

Either that, or do a complexity analysis of the algorithms. :-)

(In case anyone reading this discussion wants to know more about
complexity of algorithms, I recommend Computer Algorithms,
Introduction to Design and Analysis, by Sara Baase, Addison Wesley.)

--
Daniel C. Sobral(8-DCS)
d...@newsguy.com
d...@freebsd.org

Is it true that you're a millionaire's son who never worked a day
in your life?
Yeah, I guess so.
Lemme tell you, son, you ain't missed a thing.



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-28 Thread Mark Dickey
I expect that there is a very good reason why this shouldn't be done,
but could it be possible to implement two different algorithms/code
dependant on the size of the file being grepped?

Mark Dickey
m...@bestweb.net

Daniel C. Sobral wrote:
 James Howard wrote:
 
  Due to the discussion of speed, I have been looking at it and it is
really
  slow.  Even slower than I thought and I was thinking it was pretty slow.
 
  So using gprof, I have discovered that it seems to spend a whole mess of
  time in grep_malloc() and free().  So I pulled all the references to
  malloc inside the main loop (the copy for ln.dat and removed queueing).
  This stills leaves us with a grep that is about ~6x slower than GNU.
  Before that, it ran closer to 80x.  After this, gprof says it spends
  around 53% of its time in procline().

 Sorry, but a simplistic analysis like that just won't cut for grep.
 The algorithmic complexity is highly relevant here. Try this:
 generate a 1 Mb file, and then generate 10 Mb and 50 Mb files by
 concatenating that first file. Benchmark yours and GNU grep a number
 of times to get the average for each file. Now compare the
 *proportions* between the different sized files. Are they the same?

 Next, try different sized patterns on the 50 Mb file on both yours
 and GNU grep. Again, compare the proportion.

 Next, compare patterns with different number of wildcards,
 patterns with things like [acegikmoqsuvxz] vs
 [acegikmoqsuvxzACEGIKMOQSUVXZ], etc.

 Either that, or do a complexity analysis of the algorithms. :-)

 (In case anyone reading this discussion wants to know more about
 complexity of algorithms, I recommend Computer Algorithms,
 Introduction to Design and Analysis, by Sara Baase, Addison Wesley.)

 --
 Daniel C. Sobral (8-DCS)
 d...@newsguy.com
 d...@freebsd.org

 Is it true that you're a millionaire's son who never worked a day
 in your life?
 Yeah, I guess so.
 Lemme tell you, son, you ain't missed a thing.



 To Unsubscribe: send mail to majord...@freebsd.org
 with unsubscribe freebsd-hackers in the body of the message




To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-28 Thread Peter Jeremy
Doug d...@gorean.org wrote:
 The more complete the feature set, the better
off we are for my money.
Someone offering money?  Quick, who's got the donations hat... :-)

Peter


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-28 Thread Tim Vanderhoek
On Thu, Jul 29, 1999 at 01:59:45AM +0900, Daniel C. Sobral wrote:
 
 Sorry, but a simplistic analysis like that just won't cut for grep.
 The algorithmic complexity is highly relevant here. Try this:

Algorithmic complexity!?!

It's a freaking grep application.  There is no freaking algorithmic
complexity.

At least not outside of our regex library, anyways.  The test you
suggested doesn't show anything about that algorithmic complexity,
though.

If we have a slow regex library, though, I would consider that a
separate problem from a slow grep.


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Sheldon Hearn



On 27 Jul 1999 13:37:35 +0200, Dag-Erling Smorgrav wrote:

  URL:http://www.freebsd.org/~des/software/grep-0.7.tar.gz
 
 I move that we replace GNU grep in our source tree with this
 implementation, once it's been reviewed by all concerned parties.

When I committed the port (textproc/freegrep), Jamie assured me that
he'd keep me updated on the progress of his software. That was the last
I heard of it, and the port is still sitting at version 0.3 .

Version 0.3 broke port-building badly. Does version 0.7 make it through
a build of a whole stack of ports?

Ciao,
Sheldon.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Dag-Erling Smorgrav

Sheldon Hearn [EMAIL PROTECTED] writes:
 Version 0.3 broke port-building badly. Does version 0.7 make it through
 a build of a whole stack of ports?

Yes.

DES
-- 
Dag-Erling Smorgrav - [EMAIL PROTECTED]


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Soren Schmidt

It seems Dag-Erling Smorgrav wrote:
 Jamie Howard ([EMAIL PROTECTED]), with a little help from yours
 truly, has written a BSD-licensed version of grep(1) which has all the
 functionality of our current (GPLed) implementation, plus a little
 more, in one seventh the source code and one fourth the binary code.
 What's more, the code is actually possible for mere mortals to read
 and understand.
 
 The source code is available for download from freefall:
 
  URL:http://www.freebsd.org/~des/software/grep-0.7.tar.gz
 
 I move that we replace GNU grep in our source tree with this
 implementation, once it's been reviewed by all concerned parties.

Go for it, the more GNU stuff we nuke the better :)

-Søren


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Brian F. Feldman

On Tue, 27 Jul 1999, Soren Schmidt wrote:

 It seems Dag-Erling Smorgrav wrote:
  
  I move that we replace GNU grep in our source tree with this
  implementation, once it's been reviewed by all concerned parties.
 
 Go for it, the more GNU stuff we nuke the better :)
 
 -Søren
 

Geez, why don't we just write our own compiler and linker, assembler,
and everything? Let's get every last bit of GNU out of our system, for
no reason! This kind of NIH is not necessary, and only hurts us by
misdirecting our energies.
/joking

Seriously, I'd love for this to happen. Most GNU software is a hopeless,
gruesome mess that should be dragged out and shot. Getting rid of as
much as possible, gradually, is a Very Good Thing; this is how we get
stability and performance improvements.


 Brian Fundakowski Feldman  _ __ ___   ___ ___ ___  
 [EMAIL PROTECTED]   _ __ ___ | _ ) __|   \ 
 FreeBSD: The Power to Serve!_ __ | _ \._ \ |) |
   http://www.FreeBSD.org/  _ |___/___/___/ 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Tim Vanderhoek

On Tue, Jul 27, 1999 at 01:37:35PM +0200, Dag-Erling Smorgrav wrote:
 
 I move that we replace GNU grep in our source tree with this
 implementation, once it's been reviewed by all concerned parties.

Have you run your systems with J-grep as a replacement for GNU grep
for a while (making sure nothing breaks)?

There seems to be at least one dependency on GNU grep in
/ports/Mk/bsd.port.mk where the -F argument is used.

How's it compare in speed?  [I'd test it myself, but see my private
email...]


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Sheldon Hearn



On Tue, 27 Jul 1999 08:19:38 -0400, "Brian F. Feldman" wrote:

 Getting rid of as much as possible, gradually, is a Very Good Thing;
 this is how we get stability and performance improvements.

Only if the replacements are as stable and robust as their predecessors.

In this case, the implementation we'll be introducing will introduce a
performance loss, not a gain. As far as stability goes, there's a loss
involved _if_ passing the GNU grep regression tests is important.

Don't get me wrong. I'm all for replacing GNU software. Let's just be
realistic and keep in mind that being non-GNU doesn't necessarily mean
better.

In this case, I'm all for the change, since I don't use grep for serious
regex work and the readability gain outweighs any loss of performance.
you probably feel the same way. Out opinions are those of developers,
though. It's always worth remembering that.

Ciao,
Sheldon.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Jamie Howard

On Tue, 27 Jul 1999, Nickolay N. Dudorov wrote:

   After making it on the CURRENT system I can only
 see:
 
   grep: filename: Undefined error: 0
 
 for every filename.

Every file?

 
   This caused by very "unusual" return values for
 'grep_open' (and other '..._open') function which is declared
 as 'int' (and return int result) and compared with NULL ;-(
 
   I prefer not to include the patch for this because
 I am uncompatible with such trics as:
 
   return ((f = fopen(path, mode)) != NULL) - 1;

This was done this way because the gzopen and fopen both return pointers
of different types.  Maybe the best thing would be to have grep_open()
return a void pointer since procfile() doesn't keep track of what files
are open and not.  This is ugly and not very reusable, but then again how
many programs need transparent access to both gzip'd and plaintext files?

Jamie



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Tim Vanderhoek

On Tue, Jul 27, 1999 at 08:23:44AM -0400, Tim Vanderhoek wrote:
 
 How's it compare in speed?  [I'd test it myself, but see my private
 email...]

Okay, following-up on myself, and indirectly Sheldon,

It does seem a little too slow.  I'm not sure that this is because it
doesn't use mmap.  Supposedly the merged buffer/vm means mmap doesn't
make as large a difference as it used to.

On a file with 10+ lines, the speed difference is rather restrictive.
Looking over the gprof output, I think its authors (or some other
intrepid hacker) will find ways to speed it up.  Only about 10% of
the time is spend in procline().  There seems to be a lot of
unnecessary strncpy() that could be _easily_ avoided if free() on
util.c:130 was avoided, but I'll let the authors speak first.  :-)


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Sheldon Hearn



On Tue, 27 Jul 1999 23:18:14 +0900, "Daniel C. Sobral" wrote:

 I'm talking about cpdup, which can be found in
 http://www.backplane.com/FreeBSD/. Someone posted a port at the
 time, but I don't know if anyone ever committed the port.

I'll commit a port in the next few days.

Ciao,
Sheldon.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Garance A Drosihn

At 9:29 AM -0400 7/27/99, Tim Vanderhoek wrote:
 On a file with 10+ lines, the speed difference is rather
 restrictive. [...] Only about 10% of the time is spend in
 procline().  There seems to be a lot of unnecessary strncpy()
 that could be _easily_ avoided if free() on util.c:130 was
 avoided, but I'll let the authors speak first.  :-)

Hmm, strncpy?  Are these calls which really want strncpy
for what it was originally designed for, or are they just
trying to prevent buffer overruns?

If it's the buffer-overrun answer, then maybe this would
be a good test case for using strlcpy instead of strncpy,
and see if it makes a performance difference (since the
code won't waste it's time nulling-out bytes that don't
need to be nulled-out).


---
Garance Alistair Drosehn   =   [EMAIL PROTECTED]
Senior Systems Programmer  or  [EMAIL PROTECTED]
Rensselaer Polytechnic Institute


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Robert Nordier

 Jamie Howard ([EMAIL PROTECTED]), with a little help from yours
 truly, has written a BSD-licensed version of grep(1) which has all the
 functionality of our current (GPLed) implementation, plus a little
 more, in one seventh the source code and one fourth the binary code.

 I move that we replace GNU grep in our source tree with this
 implementation, once it's been reviewed by all concerned parties.

A couple of general problems:

o  Too many diagnostics have "Undefined error: 0" appended.
   Particularly in the case of "err(2, re_error)" in file.c,
   you probably want to look at using errx() instead.

o  Errors other than "no match" need to return a exit status
   of 2: some in file.c and util.c are returning 1.

A more general concern is whether Henry Spencer's regex routines
-- at least in our present "alpha-quality" version -- are up to
supporting a grep without much further debugging.  I don't recall
many of the problems I found when I last looked at these, though
here are two, after 5 minutes playing:

echo xx | grep '\(x\{1,2\}\)\1'
echo x | grep '[--x]'

--
Robert Nordier


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Julian Elischer



On Tue, 27 Jul 1999, Brian F. Feldman wrote:

 On Tue, 27 Jul 1999, Soren Schmidt wrote:
 
  It seems Dag-Erling Smorgrav wrote:
   
   I move that we replace GNU grep in our source tree with this
   implementation, once it's been reviewed by all concerned parties.
  
  Go for it, the more GNU stuff we nuke the better :)
  
  -Søren
  
 
 Geez, why don't we just write our own compiler and linker, assembler,
 and everything? Let's get every last bit of GNU out of our system, for
 no reason! This kind of NIH is not necessary, and only hurts us by
 misdirecting our energies.
 /joking

Actually there is a difference between grep and gcc.

you wouldn't ship cc on a binray -only embedded system.
but you might want to ship grep (so that control scripts an use it).

 
 Seriously, I'd love for this to happen. Most GNU software is a hopeless,
 gruesome mess that should be dragged out and shot. Getting rid of as
 much as possible, gradually, is a Very Good Thing; this is how we get
 stability and performance improvements.
 
 
  Brian Fundakowski Feldman  _ __ ___   ___ ___ ___  
  [EMAIL PROTECTED]   _ __ ___ | _ ) __|   \ 
  FreeBSD: The Power to Serve!_ __ | _ \._ \ |) |
http://www.FreeBSD.org/  _ |___/___/___/ 
 
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-hackers" in the body of the message
 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Doug

On 27 Jul 1999, Dag-Erling Smorgrav wrote:

 I move that we replace GNU grep in our source tree with this
 implementation, once it's been reviewed by all concerned parties.

First, I'm all for this idea, and applaud you and Jamie for taking
it on. I do have a few questions. Does POSIX say anything about grep, and
if so, is this version compliant? Also, I'd like to put in another vote
for full GNU grep feature compliance, since while having our own code is a
good thing, I am against introducing gratuitous differences since I have
enough of those to deal with already.

I think ports building is a good test, but has anyone tested
it with RCS yet? IIRC RCS is heavily dependant on GNU grep, diff and
patch.  I don't think CVS is dependant on external programs anymore
though. 

I use grep heavily in day to day administration tasks so I look
forward to giving this a try.

Doug
-- 
On account of being a democracy and run by the people, we are the only
nation in the world that has to keep a government four years, no matter
what it does.
-- Will Rogers



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Jamie Howard

On Tue, 27 Jul 1999, Doug wrote:

   First, I'm all for this idea, and applaud you and Jamie for taking
 it on. I do have a few questions. Does POSIX say anything about grep, and
 if so, is this version compliant? Also, I'd like to put in another vote
 for full GNU grep feature compliance, since while having our own code is a
 good thing, I am against introducing gratuitous differences since I have
 enough of those to deal with already.

I do not have a copy of POSIX, but I do have Unix98 which is a superset of
POSIX.  Right now, excluding bugs, it is Unix 98 and therefore POSIX
compliant except for -e.  -e should permit multiple patterns and it never
occured to me that anyone would want to do this.  When used with -F,
multiple patterns are accepted.
 
   I use grep heavily in day to day administration tasks so I look
 forward to giving this a try.

Cool, d/l it and post a bug-list :)

Jamie



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Doug

On Tue, 27 Jul 1999, Jamie Howard wrote:

 I do not have a copy of POSIX, but I do have Unix98 which is a superset of
 POSIX.  Right now, excluding bugs, it is Unix 98 and therefore POSIX
 compliant

Good news, thanks for addressing this concern. 

 except for -e.  -e should permit multiple patterns and it never
 occured to me that anyone would want to do this. 

Ah, well, if the world were limited to just what I could imagine,
how boring would that be? The more complete the feature set, the better
off we are for my money.

Doug
-- 
On account of being a democracy and run by the people, we are the only
nation in the world that has to keep a government four years, no matter
what it does.
-- Will Rogers



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread James Howard

On Tue, 27 Jul 1999, Doug wrote:

   Ah, well, if the world were limited to just what I could imagine,
 how boring would that be? The more complete the feature set, the better
 off we are for my money.

You misinterpretted, I didn't know you could do that therefore I didn't
implement that.  I certainly understand why you would want to :)



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Bill Fumerola

On 27 Jul 1999, Dag-Erling Smorgrav wrote:

 I move that we replace GNU grep in our source tree with this
 implementation, once it's been reviewed by all concerned parties.

Normally I don't post "me too" messages. I'll make an exception.

Me too.

-- 
- bill fumerola - [EMAIL PROTECTED] - BF1560 - computer horizons corp -
- ph:(800) 252-2421 - [EMAIL PROTECTED] - [EMAIL PROTECTED]  -



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Doug

On Tue, 27 Jul 1999, James Howard wrote:

 On Tue, 27 Jul 1999, Doug wrote:
 
  Ah, well, if the world were limited to just what I could imagine,
  how boring would that be? The more complete the feature set, the better
  off we are for my money.
 
 You misinterpretted, I didn't know you could do that therefore I didn't
 implement that.  I certainly understand why you would want to :)

Ah, gotcha. Even better. :) 

Doug
-- 
On account of being a democracy and run by the people, we are the only
nation in the world that has to keep a government four years, no matter
what it does.
-- Will Rogers



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread David O'Brien

$ uname -a

$ grep foo NONEXIST
Segmentation fault (core dumped)

$ gdb /usr/bin/grep grep.core
...
(no debugging symbols found)...
Core was generated by `grep'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/libz.so.2...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libc.so.3...done.
Reading symbols from /usr/libexec/ld-elf.so.1...done.
#0  0x280a8538 in ftello (fp=0x0)
at /FBSD/src/lib/libc/../libc/stdio/ftell.c:76
76  if (fp-_seek == NULL) {
(gdb) where
#0  0x280a8538 in ftello (fp=0x0)
at /FBSD/src/lib/libc/../libc/stdio/ftell.c:76
#1  0x280a84e1 in ftell (fp=0x0) at /FBSD/src/lib/libc/../libc/stdio/ftell.c:59
#2  0x80490b7 in free () at /FBSD/src/lib/libc/../libc/stdlib/malloc.c:1089
#3  0x80499f1 in free () at /FBSD/src/lib/libc/../libc/stdlib/malloc.c:1089
#4  0x804968b in free () at /FBSD/src/lib/libc/../libc/stdlib/malloc.c:1089
#5  0x8048d3d in free () at /FBSD/src/lib/libc/../libc/stdlib/malloc.c:1089
(gdb)

-- 
-- David([EMAIL PROTECTED]  -or-  [EMAIL PROTECTED])


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Sheldon Hearn


On 27 Jul 1999 13:37:35 +0200, Dag-Erling Smorgrav wrote:

  URL:http://www.freebsd.org/~des/software/grep-0.7.tar.gz
 
 I move that we replace GNU grep in our source tree with this
 implementation, once it's been reviewed by all concerned parties.

When I committed the port (textproc/freegrep), Jamie assured me that
he'd keep me updated on the progress of his software. That was the last
I heard of it, and the port is still sitting at version 0.3 .

Version 0.3 broke port-building badly. Does version 0.7 make it through
a build of a whole stack of ports?

Ciao,
Sheldon.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Dag-Erling Smorgrav
Sheldon Hearn sheld...@uunet.co.za writes:
 Version 0.3 broke port-building badly. Does version 0.7 make it through
 a build of a whole stack of ports?

Yes.

DES
-- 
Dag-Erling Smorgrav - d...@yes.no


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Sheldon Hearn


On 27 Jul 1999 13:48:21 +0200, Dag-Erling Smorgrav wrote:

  Version 0.3 broke port-building badly. Does version 0.7 make it through
  a build of a whole stack of ports?
 
 Yes.

Excellent. I'll nuke the port once you've merged the new grep to STABLE.
:-)

Later,
Sheldon.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Soren Schmidt
It seems Dag-Erling Smorgrav wrote:
 Jamie Howard (howar...@wam.umd.edu), with a little help from yours
 truly, has written a BSD-licensed version of grep(1) which has all the
 functionality of our current (GPLed) implementation, plus a little
 more, in one seventh the source code and one fourth the binary code.
 What's more, the code is actually possible for mere mortals to read
 and understand.
 
 The source code is available for download from freefall:
 
  URL:http://www.freebsd.org/~des/software/grep-0.7.tar.gz
 
 I move that we replace GNU grep in our source tree with this
 implementation, once it's been reviewed by all concerned parties.

Go for it, the more GNU stuff we nuke the better :)

-Søren


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Brian F. Feldman
On Tue, 27 Jul 1999, Soren Schmidt wrote:

 It seems Dag-Erling Smorgrav wrote:
  
  I move that we replace GNU grep in our source tree with this
  implementation, once it's been reviewed by all concerned parties.
 
 Go for it, the more GNU stuff we nuke the better :)
 
 -S?ren
 

Geez, why don't we just write our own compiler and linker, assembler,
and everything? Let's get every last bit of GNU out of our system, for
no reason! This kind of NIH is not necessary, and only hurts us by
misdirecting our energies.
/joking

Seriously, I'd love for this to happen. Most GNU software is a hopeless,
gruesome mess that should be dragged out and shot. Getting rid of as
much as possible, gradually, is a Very Good Thing; this is how we get
stability and performance improvements.


 Brian Fundakowski Feldman  _ __ ___   ___ ___ ___  
 gr...@freebsd.org   _ __ ___ | _ ) __|   \ 
 FreeBSD: The Power to Serve!_ __ | _ \._ \ |) |
   http://www.FreeBSD.org/  _ |___/___/___/ 



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Tim Vanderhoek
On Tue, Jul 27, 1999 at 01:37:35PM +0200, Dag-Erling Smorgrav wrote:
 
 I move that we replace GNU grep in our source tree with this
 implementation, once it's been reviewed by all concerned parties.

Have you run your systems with J-grep as a replacement for GNU grep
for a while (making sure nothing breaks)?

There seems to be at least one dependency on GNU grep in
/ports/Mk/bsd.port.mk where the -F argument is used.

How's it compare in speed?  [I'd test it myself, but see my private
email...]


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Sheldon Hearn


On Tue, 27 Jul 1999 08:19:38 -0400, Brian F. Feldman wrote:

 Getting rid of as much as possible, gradually, is a Very Good Thing;
 this is how we get stability and performance improvements.

Only if the replacements are as stable and robust as their predecessors.

In this case, the implementation we'll be introducing will introduce a
performance loss, not a gain. As far as stability goes, there's a loss
involved _if_ passing the GNU grep regression tests is important.

Don't get me wrong. I'm all for replacing GNU software. Let's just be
realistic and keep in mind that being non-GNU doesn't necessarily mean
better.

In this case, I'm all for the change, since I don't use grep for serious
regex work and the readability gain outweighs any loss of performance.
you probably feel the same way. Out opinions are those of developers,
though. It's always worth remembering that.

Ciao,
Sheldon.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Nickolay N. Dudorov
In xzpd7xeb9xc@des.follo.net Dag-Erling Smorgrav d...@yes.no wrote:
 Jamie Howard (howar...@wam.umd.edu), with a little help from yours
 truly, has written a BSD-licensed version of grep(1) which has all the
 functionality of our current (GPLed) implementation, plus a little
 more, in one seventh the source code and one fourth the binary code.
 What's more, the code is actually possible for mere mortals to read
 and understand.

 The source code is available for download from freefall:

  URL:http://www.freebsd.org/~des/software/grep-0.7.tar.gz

 I move that we replace GNU grep in our source tree with this
 implementation, once it's been reviewed by all concerned parties.

Unfortunately abovementioned grep-0.7.tar.gz is
broken.

After making it on the CURRENT system I can only
see:

grep: filename: Undefined error: 0

for every filename.

This caused by very unusual return values for
'grep_open' (and other '..._open') function which is declared
as 'int' (and return int result) and compared with NULL ;-(

I prefer not to include the patch for this because
I am uncompatible with such trics as:

return ((f = fopen(path, mode)) != NULL) - 1;

N.Dudorov


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread David Scheidt
On Tue, 27 Jul 1999, Sheldon Hearn wrote:

 In this case, I'm all for the change, since I don't use grep for serious
 regex work and the readability gain outweighs any loss of performance.
 you probably feel the same way. Out opinions are those of developers,
 though. It's always worth remembering that.

Does any have numbers about how much slower the new grep is?  I have 
been using the port (version 3) for my interactive grepping, and havedn't 
noticed a speed difference.  I have been using it on zippy machines though, 
where 30% hit wouldn't be noticed.

David Scheidt



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Sheldon Hearn


On Tue, 27 Jul 1999 07:49:22 EST, David Scheidt wrote:

 Does any have numbers about how much slower the new grep is?

Just by the way, if the latest version somehow uses mmap without my
having noticed, then I've ontroduced a red herring. ;-)

Version 0.3 certainly didn't use mmap. As I understand it, this means
that the performance hit, whatever the magnitude, would be noticed with
larger files.

I've copied the author, who's probably in the best position to give you
hard numbers. :-)

Ciao,
Sheldon.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Brian F. Feldman
On Tue, 27 Jul 1999, Sheldon Hearn wrote:

 
 
 On Tue, 27 Jul 1999 08:19:38 -0400, Brian F. Feldman wrote:
 
  Getting rid of as much as possible, gradually, is a Very Good Thing;
  this is how we get stability and performance improvements.
 
 Only if the replacements are as stable and robust as their predecessors.

Usually, when we get replacements, they are.

 
 In this case, the implementation we'll be introducing will introduce a
 performance loss, not a gain. As far as stability goes, there's a loss
 involved _if_ passing the GNU grep regression tests is important.

Which it isn't unless they are truly correct in their assumptions of
output behavior.

 
 Don't get me wrong. I'm all for replacing GNU software. Let's just be
 realistic and keep in mind that being non-GNU doesn't necessarily mean
 better.

Not _necessarily_, but realistically...

 
 In this case, I'm all for the change, since I don't use grep for serious
 regex work and the readability gain outweighs any loss of performance.
 you probably feel the same way. Out opinions are those of developers,
 though. It's always worth remembering that.

That's true. I'd like to see the replacement grep do mmaping of the
input files if it doesn't already, as that would speed it up. Anyway,
I haven't tried it out yet because I haven't seen it hit 1.0 :) The
only good pre-1.0 software I've seen has been the GIMP, XRacer, and
some little utilities (like a program called stat(1)).

That reminds me. I'd like to see something like stat(1) go into the source
tree, but only if it were freely licensed, not GPL-infected. I could do
it in a day, I suppose, if it were worth it. Worth it is here defined as
would be accepted to go in usr.bin.

 
 Ciao,
 Sheldon.
 

 Brian Fundakowski Feldman  _ __ ___   ___ ___ ___  
 gr...@freebsd.org   _ __ ___ | _ ) __|   \ 
 FreeBSD: The Power to Serve!_ __ | _ \._ \ |) |
   http://www.FreeBSD.org/  _ |___/___/___/ 



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Jamie Howard
On Tue, 27 Jul 1999, Nickolay N. Dudorov wrote:

   After making it on the CURRENT system I can only
 see:
 
   grep: filename: Undefined error: 0
 
 for every filename.

Every file?

 
   This caused by very unusual return values for
 'grep_open' (and other '..._open') function which is declared
 as 'int' (and return int result) and compared with NULL ;-(
 
   I prefer not to include the patch for this because
 I am uncompatible with such trics as:
 
   return ((f = fopen(path, mode)) != NULL) - 1;

This was done this way because the gzopen and fopen both return pointers
of different types.  Maybe the best thing would be to have grep_open()
return a void pointer since procfile() doesn't keep track of what files
are open and not.  This is ugly and not very reusable, but then again how
many programs need transparent access to both gzip'd and plaintext files?

Jamie



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Jamie Howard
On Tue, 27 Jul 1999, Brian F. Feldman wrote:

 That's true. I'd like to see the replacement grep do mmaping of the
 input files if it doesn't already, as that would speed it up. Anyway,

It does not use mmap right now.  And this causes a significant perforamce
hit on larger files.  An older version (I'm thinking .4) would give
equivalent performance on smaller files, 75k or so, occassionally faster.
However, larger files really drag it down, often slower by 900%.

 I haven't tried it out yet because I haven't seen it hit 1.0 :) The
 only good pre-1.0 software I've seen has been the GIMP, XRacer, and
 some little utilities (like a program called stat(1)).
 
 That reminds me. I'd like to see something like stat(1) go into the source
 tree, but only if it were freely licensed, not GPL-infected. I could do
 it in a day, I suppose, if it were worth it. Worth it is here defined as
 would be accepted to go in usr.bin.

I once saw a version of stat that carried a public domain statement on an
HP-UX software archive, I'll see if I can dig that up for you.

Jamie



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Tim Vanderhoek
On Tue, Jul 27, 1999 at 08:23:44AM -0400, Tim Vanderhoek wrote:
 
 How's it compare in speed?  [I'd test it myself, but see my private
 email...]

Okay, following-up on myself, and indirectly Sheldon,

It does seem a little too slow.  I'm not sure that this is because it
doesn't use mmap.  Supposedly the merged buffer/vm means mmap doesn't
make as large a difference as it used to.

On a file with 10+ lines, the speed difference is rather restrictive.
Looking over the gprof output, I think its authors (or some other
intrepid hacker) will find ways to speed it up.  Only about 10% of
the time is spend in procline().  There seems to be a lot of
unnecessary strncpy() that could be _easily_ avoided if free() on
util.c:130 was avoided, but I'll let the authors speak first.  :-)


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Daniel C. Sobral
Brian F. Feldman wrote:
 
 Geez, why don't we just write our own compiler and linker, assembler,
 and everything? Let's get every last bit of GNU out of our system, for
 no reason! This kind of NIH is not necessary, and only hurts us by
 misdirecting our energies.
 /joking
 
 Seriously, I'd love for this to happen. Most GNU software is a hopeless,
 gruesome mess that should be dragged out and shot. Getting rid of as
 much as possible, gradually, is a Very Good Thing; this is how we get
 stability and performance improvements.

In fact, I think the *greatest* advantage of this code is it's
readability.

Anyway, both versions exist, so it's not a question of NIH. It's a
question of choosing.

--
Daniel C. Sobral(8-DCS)
d...@newsguy.com
d...@freebsd.org

Is it true that you're a millionaire's son who never worked a day
in your life?
Yeah, I guess so.
Lemme tell you, son, you ain't missed a thing.



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Daniel C. Sobral
Dag-Erling Smorgrav wrote:
 
 Jamie Howard (howar...@wam.umd.edu), with a little help from yours
 truly, has written a BSD-licensed version of grep(1) which has all the
 functionality of our current (GPLed) implementation, plus a little
 more, in one seventh the source code and one fourth the binary code.
 What's more, the code is actually possible for mere mortals to read
 and understand.
 
 The source code is available for download from freefall:
 
  URL:http://www.freebsd.org/~des/software/grep-0.7.tar.gz
 
 I move that we replace GNU grep in our source tree with this
 implementation, once it's been reviewed by all concerned parties.

I'm concerned about performance. Grep performance is relevant to
some. Now, while I don't care if this grep is slower than what we
are using right now, I do care if it's _complexity_ is greater.

So, please, could you make sure the algorithmic complexity is not
greater, either by benchmark comparision, or by examining the code?

I would do it, if I had time. But right now I don't, and there is no
need to keep this waiting.

--
Daniel C. Sobral(8-DCS)
d...@newsguy.com
d...@freebsd.org

Is it true that you're a millionaire's son who never worked a day
in your life?
Yeah, I guess so.
Lemme tell you, son, you ain't missed a thing.




To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Daniel C. Sobral
Brian F. Feldman wrote:
 
 That reminds me. I'd like to see something like stat(1) go into the source
 tree, but only if it were freely licensed, not GPL-infected. I could do
 it in a day, I suppose, if it were worth it. Worth it is here defined as
 would be accepted to go in usr.bin.

May I discreetly open a can of worms and remind everyone of a very
nice little utility one Matthew Dillon once offered for /bin? I
still think it's worth, and, as I recall, I wasn't the only one. (In
fact, I think I didn't even voice my opinion at the time...)

I'm talking about cpdup, which can be found in
http://www.backplane.com/FreeBSD/. Someone posted a port at the
time, but I don't know if anyone ever committed the port.

--
Daniel C. Sobral(8-DCS)
d...@newsguy.com
d...@freebsd.org

Is it true that you're a millionaire's son who never worked a day
in your life?
Yeah, I guess so.
Lemme tell you, son, you ain't missed a thing.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Sheldon Hearn


On Tue, 27 Jul 1999 23:18:14 +0900, Daniel C. Sobral wrote:

 I'm talking about cpdup, which can be found in
 http://www.backplane.com/FreeBSD/. Someone posted a port at the
 time, but I don't know if anyone ever committed the port.

I'll commit a port in the next few days.

Ciao,
Sheldon.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Garance A Drosihn
At 9:29 AM -0400 7/27/99, Tim Vanderhoek wrote:
 On a file with 10+ lines, the speed difference is rather
 restrictive. [...] Only about 10% of the time is spend in
 procline().  There seems to be a lot of unnecessary strncpy()
 that could be _easily_ avoided if free() on util.c:130 was
 avoided, but I'll let the authors speak first.  :-)

Hmm, strncpy?  Are these calls which really want strncpy
for what it was originally designed for, or are they just
trying to prevent buffer overruns?

If it's the buffer-overrun answer, then maybe this would
be a good test case for using strlcpy instead of strncpy,
and see if it makes a performance difference (since the
code won't waste it's time nulling-out bytes that don't
need to be nulled-out).


---
Garance Alistair Drosehn   =   g...@eclipse.acs.rpi.edu
Senior Systems Programmer  or  dro...@rpi.edu
Rensselaer Polytechnic Institute


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Robert Nordier
 Jamie Howard (howar...@wam.umd.edu), with a little help from yours
 truly, has written a BSD-licensed version of grep(1) which has all the
 functionality of our current (GPLed) implementation, plus a little
 more, in one seventh the source code and one fourth the binary code.

 I move that we replace GNU grep in our source tree with this
 implementation, once it's been reviewed by all concerned parties.

A couple of general problems:

o  Too many diagnostics have Undefined error: 0 appended.
   Particularly in the case of err(2, re_error) in file.c,
   you probably want to look at using errx() instead.

o  Errors other than no match need to return a exit status
   of 2: some in file.c and util.c are returning 1.

A more general concern is whether Henry Spencer's regex routines
-- at least in our present alpha-quality version -- are up to
supporting a grep without much further debugging.  I don't recall
many of the problems I found when I last looked at these, though
here are two, after 5 minutes playing:

echo xx | grep '\(x\{1,2\}\)\1'
echo x | grep '[--x]'

--
Robert Nordier


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Julian Elischer


On Tue, 27 Jul 1999, Brian F. Feldman wrote:

 On Tue, 27 Jul 1999, Soren Schmidt wrote:
 
  It seems Dag-Erling Smorgrav wrote:
   
   I move that we replace GNU grep in our source tree with this
   implementation, once it's been reviewed by all concerned parties.
  
  Go for it, the more GNU stuff we nuke the better :)
  
  -Søren
  
 
 Geez, why don't we just write our own compiler and linker, assembler,
 and everything? Let's get every last bit of GNU out of our system, for
 no reason! This kind of NIH is not necessary, and only hurts us by
 misdirecting our energies.
 /joking

Actually there is a difference between grep and gcc.

you wouldn't ship cc on a binray -only embedded system.
but you might want to ship grep (so that control scripts an use it).

 
 Seriously, I'd love for this to happen. Most GNU software is a hopeless,
 gruesome mess that should be dragged out and shot. Getting rid of as
 much as possible, gradually, is a Very Good Thing; this is how we get
 stability and performance improvements.
 
 
  Brian Fundakowski Feldman  _ __ ___   ___ ___ ___  
  gr...@freebsd.org   _ __ ___ | _ ) __|   \ 
  FreeBSD: The Power to Serve!_ __ | _ \._ \ |) |
http://www.FreeBSD.org/  _ |___/___/___/ 
 
 
 
 To Unsubscribe: send mail to majord...@freebsd.org
 with unsubscribe freebsd-hackers in the body of the message
 



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Doug
On 27 Jul 1999, Dag-Erling Smorgrav wrote:

 I move that we replace GNU grep in our source tree with this
 implementation, once it's been reviewed by all concerned parties.

First, I'm all for this idea, and applaud you and Jamie for taking
it on. I do have a few questions. Does POSIX say anything about grep, and
if so, is this version compliant? Also, I'd like to put in another vote
for full GNU grep feature compliance, since while having our own code is a
good thing, I am against introducing gratuitous differences since I have
enough of those to deal with already.

I think ports building is a good test, but has anyone tested
it with RCS yet? IIRC RCS is heavily dependant on GNU grep, diff and
patch.  I don't think CVS is dependant on external programs anymore
though. 

I use grep heavily in day to day administration tasks so I look
forward to giving this a try.

Doug
-- 
On account of being a democracy and run by the people, we are the only
nation in the world that has to keep a government four years, no matter
what it does.
-- Will Rogers



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Wolfram Schneider
On 1999-07-27 13:37:35 +0200, Dag-Erling Smorgrav wrote:
 Jamie Howard (howar...@wam.umd.edu), with a little help from yours
 truly, has written a BSD-licensed version of grep(1) which has all the
 functionality of our current (GPLed) implementation, plus a little
 more, in one seventh the source code and one fourth the binary code.
 What's more, the code is actually possible for mere mortals to read
 and understand.
 
 The source code is available for download from freefall:
 
  URL:http://www.freebsd.org/~des/software/grep-0.7.tar.gz
 
 I move that we replace GNU grep in our source tree with this
 implementation, once it's been reviewed by all concerned parties.

It is 25 times slower than GNU grep ;-(((

$ time /usr/bin/grep foobar  /var/tmp/mailbox /dev/null 
0.90 real 0.78 user 0.12 sys
$ time /usr/local/bin/grep foobar  /var/tmp/mailbox /dev/null 
   24.31 real22.36 user 1.69 sys

(/var/tmp/mailbox is 81MB large).

I often use grep for large data (in main memory). I don't care about the
GNU license. I care about poor performance.

-- 
Wolfram Schneider wo...@freebsd.org http://wolfram.schneider.org


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Jamie Howard
On Tue, 27 Jul 1999, Doug wrote:

   First, I'm all for this idea, and applaud you and Jamie for taking
 it on. I do have a few questions. Does POSIX say anything about grep, and
 if so, is this version compliant? Also, I'd like to put in another vote
 for full GNU grep feature compliance, since while having our own code is a
 good thing, I am against introducing gratuitous differences since I have
 enough of those to deal with already.

I do not have a copy of POSIX, but I do have Unix98 which is a superset of
POSIX.  Right now, excluding bugs, it is Unix 98 and therefore POSIX
compliant except for -e.  -e should permit multiple patterns and it never
occured to me that anyone would want to do this.  When used with -F,
multiple patterns are accepted.
 
   I use grep heavily in day to day administration tasks so I look
 forward to giving this a try.

Cool, d/l it and post a bug-list :)

Jamie



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: replacing grep(1)

1999-07-27 Thread Doug
On Tue, 27 Jul 1999, Jamie Howard wrote:

 I do not have a copy of POSIX, but I do have Unix98 which is a superset of
 POSIX.  Right now, excluding bugs, it is Unix 98 and therefore POSIX
 compliant

Good news, thanks for addressing this concern. 

 except for -e.  -e should permit multiple patterns and it never
 occured to me that anyone would want to do this. 

Ah, well, if the world were limited to just what I could imagine,
how boring would that be? The more complete the feature set, the better
off we are for my money.

Doug
-- 
On account of being a democracy and run by the people, we are the only
nation in the world that has to keep a government four years, no matter
what it does.
-- Will Rogers



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



  1   2   >