Re: replacing grep(1)
John-Mark Gurney wrote: right now, I'm trying to think of a way to eliminate the fgetln searching for end of line... of course this would eliminate some of the simplicity of design, but we can get a BIG speed increase if we simply don't scan for the new line unless we NEED to... and if we do, why not use regexec to search for us? As Dillon said, the decrease in speed of the scan might not be that great. On the other hand, a decent pattern matching algorithm *does not* examine every character (which is why GNU grep performs so much better with bigger patterns). -- Daniel C. Sobral(8-DCS) [EMAIL PROTECTED] [EMAIL PROTECTED] - Jordan, God, what's the difference? - God doesn't belong to the -core. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Fri, 30 Jul 1999 22:07:26 -0400, Tim Vanderhoek wrote: b$ time ./grep -E '(vt100)|(printer)' longfile /dev/null b$ time grep '(vt100)|(printer)' longfile /dev/null You think that's fair? Surely you can't expect Jamie's extended regex support to outperform GNU's simple regex support? :-) Ciao, Sheldon. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Sat, Jul 31, 1999 at 11:56:16PM +0200, Sheldon Hearn wrote: b$ time ./grep -E '(vt100)|(printer)' longfile /dev/null b$ time grep '(vt100)|(printer)' longfile /dev/null You think that's fair? Surely you can't expect Jamie's extended regex support to outperform GNU's simple regex support? :-) GNU has no simple regex support. Actually, neither did Jamie's by the time I did that test, but I added the -E flag to make it obvious what was going on. :) I rather hope that the rumoured newer version of H. Spencer's regex lib is faster... Being as slow for that pattern as it is has got to be a bug of some sort... It's actually faster to scan the file twice, once for the first string and then for the second. -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Sat, 31 Jul 1999, Tim Vanderhoek wrote: I rather hope that the rumoured newer version of H. Spencer's regex lib is faster... Being as slow for that pattern as it is has got to be a bug of some sort... It's actually faster to scan the file twice, once for the first string and then for the second. If it is not, how about linking it with libregex? I realize it is GNU too, but it will be there whether or not grep gets replaced and the authors were at least kind enough to LGPL it instead. Hey, maybe someone who knows more about regular expressions than I do would feel compelled to rewrite GNU regex... :) I bet the existing Spencer libraries would be a good starting point and maybe the rumored new version is a great starting point... But that's enough hint dropping... Jamie To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
John-Mark Gurney wrote: right now, I'm trying to think of a way to eliminate the fgetln searching for end of line... of course this would eliminate some of the simplicity of design, but we can get a BIG speed increase if we simply don't scan for the new line unless we NEED to... and if we do, why not use regexec to search for us? As Dillon said, the decrease in speed of the scan might not be that great. On the other hand, a decent pattern matching algorithm *does not* examine every character (which is why GNU grep performs so much better with bigger patterns). -- Daniel C. Sobral(8-DCS) d...@newsguy.com d...@freebsd.org - Jordan, God, what's the difference? - God doesn't belong to the -core. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Fri, 30 Jul 1999 22:07:26 -0400, Tim Vanderhoek wrote: b$ time ./grep -E '(vt100)|(printer)' longfile /dev/null b$ time grep '(vt100)|(printer)' longfile /dev/null You think that's fair? Surely you can't expect Jamie's extended regex support to outperform GNU's simple regex support? :-) Ciao, Sheldon. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Sat, Jul 31, 1999 at 11:56:16PM +0200, Sheldon Hearn wrote: b$ time ./grep -E '(vt100)|(printer)' longfile /dev/null b$ time grep '(vt100)|(printer)' longfile /dev/null You think that's fair? Surely you can't expect Jamie's extended regex support to outperform GNU's simple regex support? :-) GNU has no simple regex support. Actually, neither did Jamie's by the time I did that test, but I added the -E flag to make it obvious what was going on. :) I rather hope that the rumoured newer version of H. Spencer's regex lib is faster... Being as slow for that pattern as it is has got to be a bug of some sort... It's actually faster to scan the file twice, once for the first string and then for the second. -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Sat, 31 Jul 1999, Tim Vanderhoek wrote: I rather hope that the rumoured newer version of H. Spencer's regex lib is faster... Being as slow for that pattern as it is has got to be a bug of some sort... It's actually faster to scan the file twice, once for the first string and then for the second. If it is not, how about linking it with libregex? I realize it is GNU too, but it will be there whether or not grep gets replaced and the authors were at least kind enough to LGPL it instead. Hey, maybe someone who knows more about regular expressions than I do would feel compelled to rewrite GNU regex... :) I bet the existing Spencer libraries would be a good starting point and maybe the rumored new version is a great starting point... But that's enough hint dropping... Jamie To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
James Howard [EMAIL PROTECTED] writes: DES tells me he has a new version (0.10) which mmap()s. It supposedly cuts the run time down significantly, I do not have the numbers in front of me. Unfortunetly he has not posted this version yet so I cannot download it and run it myself. It's in the usual place (ftp://ftp.ofug.org/pub/grep/). He also says that if mmap fails, he drops back to stdio. This should only happen in the NFS case, the 2G case, etc. Any case in which a) the file is too large to mmap, b) the file is not a regular file, or c) mmap() fails (e.g. NFS). DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
John-Mark Gurney [EMAIL PROTECTED] writes: it was VERY simple to do... and attached is the patch... this uses the option REG_STARTEND to do what the copy was trying to do... all of the code to use REG_STARTEND was already there, it just needed to be enabled.. Funnily, I experience a near-doubling of running time with similar patches. DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
John-Mark Gurney wrote: ok, I just made a patch to eliminate the copy that was happening in procfile, and it sped up a grep of a 5meg termcap from about 2.9sec down to .6 seconds... this includes time spent profiling the program.. GNU grep w/o profiling only takes .15sec so we ARE getting closer to GNU grep... Rather impressive. But... did you run these tests more than once, to account for vm caching? it was VERY simple to do... and attached is the patch... this uses the option REG_STARTEND to do what the copy was trying to do... all of the code to use REG_STARTEND was already there, it just needed to be enabled.. Just for the record... :-) This eliminates one of the "added complexities" I pointed out. -- Daniel C. Sobral(8-DCS) [EMAIL PROTECTED] [EMAIL PROTECTED] "Is it true that you're a millionaire's son who never worked a day in your life?" "Yeah, I guess so." "Lemme tell you, son, you ain't missed a thing." To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
"Daniel C. Sobral" [EMAIL PROTECTED] writes: Dag-Erling Smorgrav wrote: To be precise, I experience a 30% decrease in system time and a 100% increase in user time when I use RE_STARTEND and eliminate the malloc() / memcpy() calls in procfile(). Could you please test my patch that removes malloc() but bot memcpy()? Here it is again, though against an old version: Yeah. You can do even better by declaring ln static and never free()ing it. DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
"Daniel C. Sobral" [EMAIL PROTECTED] writes: Could you please test my patch that removes malloc() but bot memcpy()? Here it is again, though against an old version: Bingo. REG_STARTEND is significantly more expensive than memcpy(). DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Fri, Jul 30, 1999 at 10:56:55PM +0900, Daniel C. Sobral wrote: I said that I did not care whether the thing is inside or outside the regexp library. Yes, although I think at this point it's obvious we're coming at this discussion from fairly different perspectives. By the time you brought-up complexity originally, I had more or less decided that I did not want to see the new grep imported without significant speed improvements and was concerned with how to improve grep. Your interest is in debating that point (fortunately arguing for the side I agree with :). 4) grep -e 123 456 world.build [I assume "grep -e 123 -e 124 world.build"] One can clearly see that GNU grep has a much better complexity in the cases of longer patterns or multiple patterns with common prefix. Alright, someone else already mentioned to me in email that I totally ignored what differences involved multiple patterns. Combining multiple patterns is a big win if those two patterns have a common prefix (I hadn't considered the case of similar patterns before, actually). Combining multiple patterns when they're dissimilar doesn't appear to help much (which is the only case I had considered -- my mistake, and also the reason I ignored what you said about multiple patterns). I'm surprised by the way GNU grep is able to handle longer patterns, and I probably wouldn't have noticed it unless I'd taken some time to examine the GNU source. Congratulations, you win. :) The rest of your lengthy message mostly goes on to repeat the fact that GNU grep is able to merge multiple patterns with a common prefix (and postfix?) to good effect. It also shows that the new grep spends a lot of time in an activity not related to the search itself, since it does multiple patterns by Well, duh. This is really why my reaction to "complexity analysis" is (still) what it is. Complexity analysis is almost only useful for comparing two different algorithms and the fact that the new grep spends a lot of time doing things other than pattern searching is quite obvious after a casual perusal of the source. Complexity analysis does not (directly) help improving an algorithm. With the possible exception of the idea of merging common prefixes, most of this is not useful (at this stage) to improving grep. If I was going to propose replacing the existing GNU grep, I would (and always would have) done considerable more speed trials than the simple one in my last message. It would seem that GNU grep is superior in the case of partial matches without a full match too, but the standard deviation for the That is almost certainly something inside the regex library, which I have repeatedly said I'm not interested in even looking at. If our regex library is too slow, then we need to look into the newer one the Henry Spencer is rumoured to be sitting on. -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Fri, Jul 30, 1999 at 03:27:20PM +0200, Dag-Erling Smorgrav wrote: it was VERY simple to do... and attached is the patch... this uses the option REG_STARTEND to do what the copy was trying to do... all of the code to use REG_STARTEND was already there, it just needed to be enabled.. Funnily, I experience a near-doubling of running time with similar patches. Strange... His patches made grep on my system much faster than the original 0.10 and almost as fast as GNU grep. b$ /usr/bin/time ./grep-10 -e printer longfile /dev/null 1.16 real 0.97 user 0.19 sys b$ /usr/bin/time ./grep-10-jmg -e printer longfile /dev/null 0.48 real 0.43 user 0.04 sys b$ /usr/bin/time grep -e printer longfile /dev/null 0.28 real 0.09 user 0.18 sys This is one of the original Celerons, FWIW. Once-in-a-while that gives me performance numbers somewhat different from any other Intel. -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Fri, Jul 30, 1999 at 03:27:20PM +0200, Dag-Erling Smorgrav wrote: Funnily, I experience a near-doubling of running time with similar patches. Incidentally, it seems that it's not possible to assume that our regex library is even anywhere in the same league as the GNU regex library. b$ time ./grep -E '(vt100)|(printer)' longfile /dev/null real0m21.284s user0m22.034s sys 0m0.083s Now, with a profiled executable with optimization turned off it takes about 25 seconds. Regardless, it appears to spend 98% of its time in regexec(), which is good, since that's where it should be spending time. [I had been intending to combine multiple patterns, ultimately combining in a '\n' to avoid the memchr() in mmopen]. b$ time grep '(vt100)|(printer)' longfile /dev/null real0m0.267s user0m0.109s sys 0m0.157s 98% * 20 = ~19... Without an improved regex library, any mildly complicated pattern will bring the new grep to its knees. This could be the dfa helping GNU grep more than having a better regexp library... Probably both. I wonder how well the devel/pcre port would do POSIX regular expressions. -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
Tim Vanderhoek vand...@ecf.utoronto.ca writes: I do. Still far too slow. I'll work on this tomorrow, since that seems the only way to convince people that mmap is not such a big win. :-( mmap() gives a fourfold speed increase. I call that a big win. I have a few other ideas which will make 0.11 even faster. DES -- Dag-Erling Smorgrav - d...@flood.ping.uio.no To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
James Howard howar...@wam.umd.edu writes: DES tells me he has a new version (0.10) which mmap()s. It supposedly cuts the run time down significantly, I do not have the numbers in front of me. Unfortunetly he has not posted this version yet so I cannot download it and run it myself. It's in the usual place (ftp://ftp.ofug.org/pub/grep/). He also says that if mmap fails, he drops back to stdio. This should only happen in the NFS case, the 2G case, etc. Any case in which a) the file is too large to mmap, b) the file is not a regular file, or c) mmap() fails (e.g. NFS). DES -- Dag-Erling Smorgrav - d...@flood.ping.uio.no To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
John-Mark Gurney gurne...@efn.org writes: it was VERY simple to do... and attached is the patch... this uses the option REG_STARTEND to do what the copy was trying to do... all of the code to use REG_STARTEND was already there, it just needed to be enabled.. Funnily, I experience a near-doubling of running time with similar patches. DES -- Dag-Erling Smorgrav - d...@flood.ping.uio.no To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
Dag-Erling Smorgrav d...@flood.ping.uio.no writes: John-Mark Gurney gurne...@efn.org writes: it was VERY simple to do... and attached is the patch... this uses the option REG_STARTEND to do what the copy was trying to do... all of the code to use REG_STARTEND was already there, it just needed to be enabled.. Funnily, I experience a near-doubling of running time with similar patches. To be precise, I experience a 30% decrease in system time and a 100% increase in user time when I use RE_STARTEND and eliminate the malloc() / memcpy() calls in procfile(). DES -- Dag-Erling Smorgrav - d...@flood.ping.uio.no To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
John-Mark Gurney wrote: ok, I just made a patch to eliminate the copy that was happening in procfile, and it sped up a grep of a 5meg termcap from about 2.9sec down to .6 seconds... this includes time spent profiling the program.. GNU grep w/o profiling only takes .15sec so we ARE getting closer to GNU grep... Rather impressive. But... did you run these tests more than once, to account for vm caching? it was VERY simple to do... and attached is the patch... this uses the option REG_STARTEND to do what the copy was trying to do... all of the code to use REG_STARTEND was already there, it just needed to be enabled.. Just for the record... :-) This eliminates one of the added complexities I pointed out. -- Daniel C. Sobral(8-DCS) d...@newsguy.com d...@freebsd.org Is it true that you're a millionaire's son who never worked a day in your life? Yeah, I guess so. Lemme tell you, son, you ain't missed a thing. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
Tim Vanderhoek wrote: I'm sorry. I've read your message and have decided that you're wrong. Not that you did bother to counter the points I made. You only comment on the one thing I said was probably insignificant. Are you taking your clues from me? :-) Outside of the regexp library, algorithmic complexity is not a factor here. It would take a beanbag to write anything other than an O(N) algorithm. I said that I did not care whether the thing is inside or outside the regexp library. And a N*search+N*copy, as opposed to N*search, *is* relevant. And that N*copy is outside regexp. And, just for the reference, GNU Grep uses a dfa to identify likely matches before letting gnuregexp work. The proposed grep is slow, very slow, and I've sent a long message to James outlining how to make it much faster, but algorithmic complexity is not an issue. So you say without having checked. The test you suggested doesn't show anything about that algorithmic complexity, though. Yeah? Try to back that with the results of the tests I suggested. No, it's not even worth my time. Now look. You've gotten me so upset I actually went and did a simple test. The test showed I'm right and you're wrong. Catting X number of copies of /etc/termcap into longfile causes the time grep uses to pass longfile searching for all occurrences of printer causes it to use an extra 0.03 seconds for every repetition of /etc/termcap in longfile. Gee, linear complexity wrt to file length. Who could've guessed!? That does not *begin* to cover the cases I outlined. What'ya bet GNU grep also exhibits linear complexity? :) Admit it, you jumped in with some bullshit about complexity when had you taken the time to look into what James meant when he said it now spends 50% of its time in procline() you would have kept quiet, realizing that he was talking about a constant factor in the complexity analysis, an subject where comments such as it now spends 50% of its time in procline() are relevent. Ok, here is the _DATA_ backing my bullshit. First table: searching for non-existent patterns Tests: 1) grep -e 123 world.build 2) grep -e 123456 world.build 3) grep -e 123 124 world.build 4) grep -e 123 456 world.build These were made with GNU grep, the version 0.9 of the new grep, and that version with the patch I sent previously (this later was non-intended -- only after completing the test I realized the executable was the one with my patches). Each test was repeated five times after both the executable and the target file were cached. I show here the averages of the line real for time. The user and sys values were actually more interesting, but with much greater deviation. :-) GNU grepNew grepPatched grep 1) 0.09945s0.4460s 0.3870s 2) 0.07225s0.4424s 0.3894s 3) 0.12200s0.6352s 0.5814s 4) 0.18240s0.6364s 0.5796s One can clearly see that GNU grep has a much better complexity in the cases of longer patterns or multiple patterns with common prefix. It also shows that the new grep spends a lot of time in an activity not related to the search itself, since it does multiple patterns by calling regexec() multiple times, but 2:1 is not the proportion you see up there. Also, the patch I introduced to eliminate N*(malloc()+free()), N being the number of lines searched, significantly reduces that overhead (overhead as in, *beyond* the time spent in regexec()). Second table: searching for existing patterns Tests: 1) grep -e net world.build /dev/null 2) grep -e netipx world.build /dev/null 3) grep -e netinet world.build /dev/null 4) grep -e netinet -e netipx world.build /dev/null GNU grepNew grep 1) 0.10750s0.57060s 2) 0.07575s0.46375s 3) 0.07416s0.46700s 4) 0.09950s0.67440s Though these tests involve more factors because each has a different number of matches, it again shows very clearly that the new grep has increased complexity in the case of multiple patterns. See there, cases 1 and 4. The latter has *less* matches than the former. Third table: non-existing pattern on different sized files Tests: 1) grep 123 world.build 2) grep 123 world.build.2 (two times world.build) 3) grep 123 world.build.3 (three times world.build) 4) grep 123 world.build.4 (four times world.build) GNU grepNew grep 1) 0.09600s0.44750s 2) 0.16425s0.89075s 3) 0.24760s1.30850s 4) 0.31833s1.75900s Linear, it would seem... but, alas, this is to be expected. Grep searches inside lines, and the above does not increase the size of a line, only the number of them. Still, it's a relief that the new grep does not have a worse performance in this most simple test. Fourth table: non-existing patterns on files with different line sizes. Tests: 1) grep abc line10 2) grep abc line20 3) grep 124 line10 4)
Re: replacing grep(1)
Dag-Erling Smorgrav wrote: To be precise, I experience a 30% decrease in system time and a 100% increase in user time when I use RE_STARTEND and eliminate the malloc() / memcpy() calls in procfile(). Could you please test my patch that removes malloc() but bot memcpy()? Here it is again, though against an old version: --- util.c.orig Thu Jul 29 19:14:17 1999 +++ util.c Thu Jul 29 20:49:16 1999 @@ -107,6 +107,8 @@ ln.file = fn; ln.line_no = 0; + ln.bufsize = 81; /* Magical constants, yeah! */ + ln.dat = grep_malloc(81); linesqueued = 0; if (Bflag 0) @@ -115,11 +117,14 @@ ln.off = grep_tell(); if ((tmp = grep_getln(ln.len)) == NULL) break; - ln.dat = grep_malloc(ln.len + 1); + if (ln.bufsize ln.len + 1) + ln.dat = grep_realloc(ln.dat, ln.len + 1); memcpy(ln.dat, tmp, ln.len); - ln.dat[ln.len] = 0; if (ln.len 0 ln.dat[ln.len - 1] == '\n') ln.dat[--ln.len] = 0; + else + ln.dat[ln.len] = 0; + ln.line_no++; z = tail; @@ -127,9 +132,9 @@ enqueue(ln); linesqueued++; } - free(ln.dat); c += t; } + free(ln.dat); if (Bflag 0) clearqueue(); grep_close(); --- grep.h.orig Thu Jul 29 20:47:52 1999 +++ grep.h Thu Jul 29 20:48:34 1999 @@ -35,6 +35,7 @@ typedef struct { size_t len; + size_t bufsize; int line_no; int off; char*file; -- Daniel C. Sobral(8-DCS) d...@newsguy.com d...@freebsd.org Is it true that you're a millionaire's son who never worked a day in your life? Yeah, I guess so. Lemme tell you, son, you ain't missed a thing. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
Daniel C. Sobral d...@newsguy.com writes: Dag-Erling Smorgrav wrote: To be precise, I experience a 30% decrease in system time and a 100% increase in user time when I use RE_STARTEND and eliminate the malloc() / memcpy() calls in procfile(). Could you please test my patch that removes malloc() but bot memcpy()? Here it is again, though against an old version: Yeah. You can do even better by declaring ln static and never free()ing it. DES -- Dag-Erling Smorgrav - d...@flood.ping.uio.no To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
Daniel C. Sobral d...@newsguy.com writes: Could you please test my patch that removes malloc() but bot memcpy()? Here it is again, though against an old version: Bingo. REG_STARTEND is significantly more expensive than memcpy(). DES -- Dag-Erling Smorgrav - d...@flood.ping.uio.no To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
Daniel C. Sobral scribbled this message on Jul 30: Dag-Erling Smorgrav wrote: To be precise, I experience a 30% decrease in system time and a 100% increase in user time when I use RE_STARTEND and eliminate the malloc() / memcpy() calls in procfile(). Could you please test my patch that removes malloc() but bot memcpy()? Here it is again, though against an old version: wierd, I was running your patch, and at first I would get from .69 up to 1.03 seconds run time, but I can't seem to generate that problem right now... w/ your patches I'm getting around .67 to .7 seconds for: time ./grep THIS /tmp/ports/freegrep/work/grep-0.10/termcap.long /dev/null 0.68 real 0.63 user 0.03 sys 0.67 real 0.65 user 0.01 sys 0.67 real 0.63 user 0.03 sys 0.67 real 0.63 user 0.03 sys 0.67 real 0.66 user 0.00 sys 0.67 real 0.64 user 0.02 sys summary of gprof output: [3] 50.10.020.21 108213 procline [3] [4] 46.70.020.19 108213 regexec [4] [7] 28.50.130.00 108214 mmfgetln [7] [10] 4.80.000.022393 grep_realloc [10] with my patch and the exact same command, I get .58 to .59 seconds... 0.58 real 0.54 user 0.03 sys 0.58 real 0.53 user 0.04 sys 0.58 real 0.55 user 0.02 sys 0.58 real 0.57 user 0.00 sys 0.59 real 0.55 user 0.02 sys 0.58 real 0.55 user 0.02 sys summary of gprof output: [3] 57.10.040.19 108213 procline [3] [4] 48.00.020.17 108213 regexec [4] [7] 34.10.130.00 108214 mmfgetln [7] [10] 2.00.010.00 1 _munmap [10] (I include _munmap because realloc/malloc/free are in the 0.0% on my patch) and grep 0.10 w/o patches: 2.82 real 1.63 user 1.12 sys 2.79 real 1.53 user 1.20 sys 2.80 real 1.65 user 1.08 sys 2.84 real 1.67 user 1.10 sys 2.82 real 1.67 user 1.08 sys 2.91 real 1.66 user 1.14 sys summary of gprof output: [5] 55.11.120.00 74985 _madvise [5] [7] 13.30.040.23 108213 regexec [7] [9] 8.40.000.17 108217 grep_malloc [9] [13] 6.50.130.00 108214 mmfgetln [13] all of the programs were compiled w/ the exact same options... that is I added -g -pg to the CFLAGS in the Makefile to generate profiling info.. I'm not sure about you, but on my k6/200, the STARTEND is more efficient than the memcpy/realloc, and to tell you the truth, I can't see why it'd be more effecient to copy possible multiple kilobytes of data than to just use indexes instead of modifing a ptr... right now, I'm trying to think of a way to eliminate the fgetln searching for end of line... of course this would eliminate some of the simplicity of design, but we can get a BIG speed increase if we simply don't scan for the new line unless we NEED to... and if we do, why not use regexec to search for us? -- John-Mark Gurney Voice: +1 541 684 8449 Cu Networking P.O. Box 5693, 97405 The soul contains in itself the event that shall presently befall it. The event is only the actualizing of its thought. -- Ralph Waldo Emerson To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Fri, Jul 30, 1999 at 10:56:55PM +0900, Daniel C. Sobral wrote: I said that I did not care whether the thing is inside or outside the regexp library. Yes, although I think at this point it's obvious we're coming at this discussion from fairly different perspectives. By the time you brought-up complexity originally, I had more or less decided that I did not want to see the new grep imported without significant speed improvements and was concerned with how to improve grep. Your interest is in debating that point (fortunately arguing for the side I agree with :). 4) grep -e 123 456 world.build [I assume grep -e 123 -e 124 world.build] One can clearly see that GNU grep has a much better complexity in the cases of longer patterns or multiple patterns with common prefix. Alright, someone else already mentioned to me in email that I totally ignored what differences involved multiple patterns. Combining multiple patterns is a big win if those two patterns have a common prefix (I hadn't considered the case of similar patterns before, actually). Combining multiple patterns when they're dissimilar doesn't appear to help much (which is the only case I had considered -- my mistake, and also the reason I ignored what you said about multiple patterns). I'm surprised by the way GNU grep is able to handle longer patterns, and I probably wouldn't have noticed it unless I'd taken some time to examine the GNU source. Congratulations, you win. :) The rest of your lengthy message mostly goes on to repeat the fact that GNU grep is able to merge multiple patterns with a common prefix (and postfix?) to good effect. It also shows that the new grep spends a lot of time in an activity not related to the search itself, since it does multiple patterns by Well, duh. This is really why my reaction to complexity analysis is (still) what it is. Complexity analysis is almost only useful for comparing two different algorithms and the fact that the new grep spends a lot of time doing things other than pattern searching is quite obvious after a casual perusal of the source. Complexity analysis does not (directly) help improving an algorithm. With the possible exception of the idea of merging common prefixes, most of this is not useful (at this stage) to improving grep. If I was going to propose replacing the existing GNU grep, I would (and always would have) done considerable more speed trials than the simple one in my last message. It would seem that GNU grep is superior in the case of partial matches without a full match too, but the standard deviation for the That is almost certainly something inside the regex library, which I have repeatedly said I'm not interested in even looking at. If our regex library is too slow, then we need to look into the newer one the Henry Spencer is rumoured to be sitting on. -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Fri, Jul 30, 1999 at 03:27:20PM +0200, Dag-Erling Smorgrav wrote: it was VERY simple to do... and attached is the patch... this uses the option REG_STARTEND to do what the copy was trying to do... all of the code to use REG_STARTEND was already there, it just needed to be enabled.. Funnily, I experience a near-doubling of running time with similar patches. Strange... His patches made grep on my system much faster than the original 0.10 and almost as fast as GNU grep. b$ /usr/bin/time ./grep-10 -e printer longfile /dev/null 1.16 real 0.97 user 0.19 sys b$ /usr/bin/time ./grep-10-jmg -e printer longfile /dev/null 0.48 real 0.43 user 0.04 sys b$ /usr/bin/time grep -e printer longfile /dev/null 0.28 real 0.09 user 0.18 sys This is one of the original Celerons, FWIW. Once-in-a-while that gives me performance numbers somewhat different from any other Intel. -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Fri, Jul 30, 1999 at 03:27:20PM +0200, Dag-Erling Smorgrav wrote: Funnily, I experience a near-doubling of running time with similar patches. Incidentally, it seems that it's not possible to assume that our regex library is even anywhere in the same league as the GNU regex library. b$ time ./grep -E '(vt100)|(printer)' longfile /dev/null real0m21.284s user0m22.034s sys 0m0.083s Now, with a profiled executable with optimization turned off it takes about 25 seconds. Regardless, it appears to spend 98% of its time in regexec(), which is good, since that's where it should be spending time. [I had been intending to combine multiple patterns, ultimately combining in a '\n' to avoid the memchr() in mmopen]. b$ time grep '(vt100)|(printer)' longfile /dev/null real0m0.267s user0m0.109s sys 0m0.157s 98% * 20 = ~19... Without an improved regex library, any mildly complicated pattern will bring the new grep to its knees. This could be the dfa helping GNU grep more than having a better regexp library... Probably both. I wonder how well the devel/pcre port would do POSIX regular expressions. -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Thu, Jul 29, 1999 at 09:16:53PM +0900, Daniel C. Sobral wrote: Sorry, but a simplistic analysis like that just won't cut for grep. The algorithmic complexity is highly relevant here. Try this: Algorithmic complexity!?! Yup. I'm sorry. I've read your message and have decided that you're wrong. Outside of the regexp library, algorithmic complexity is not a factor here. It would take a beanbag to write anything other than an O(N) algorithm. The proposed grep is slow, very slow, and I've sent a long message to James outlining how to make it much faster, but algorithmic complexity is not an issue. Also, fgetln() will copy the line buffer from time to time, though that's not a simple computation, and probably of little fgetln() does a complete copy of the line buffer whenever an excessively long line is found. On this point, it's hard to do better without using mmap(), but mmap() has its own disadvantages. My last suggestion to James was to assume a worst case for long lines and mark the worst worst case with an XXX "this is unfortunate". The test you suggested doesn't show anything about that algorithmic complexity, though. Yeah? Try to back that with the results of the tests I suggested. No, it's not even worth my time. Now look. You've gotten me so upset I actually went and did a simple test. The test showed I'm right and you're wrong. Catting X number of copies of /etc/termcap into longfile causes the time grep uses to pass longfile searching for all occurrences of "printer" causes it to use an extra 0.03 seconds for every repetition of /etc/termcap in longfile. Gee, linear complexity wrt to file length. Who could've guessed!? What'ya bet GNU grep also exhibits linear complexity? :) Admit it, you jumped in with some bullshit about complexity when had you taken the time to look into what James meant when he said "it now spends 50% of its time in procline()" you would have kept quiet, realizing that he was talking about a constant factor in the complexity analysis, an subject where comments such as "it now spends 50% of its time in procline()" are relevent. :-) [Never mind that it should be spending near 100% of its time in procline...that just means he's still got work to do... :-] -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Thu, 29 Jul 1999, Tim Vanderhoek wrote: fgetln() does a complete copy of the line buffer whenever an excessively long line is found. On this point, it's hard to do better without using mmap(), but mmap() has its own disadvantages. My last suggestion to James was to assume a worst case for long lines and mark the worst worst case with an XXX "this is unfortunate". warning type="Anything said here wrong is my fault, not DES's" DES tells me he has a new version (0.10) which mmap()s. It supposedly cuts the run time down significantly, I do not have the numbers in front of me. Unfortunetly he has not posted this version yet so I cannot download it and run it myself. He also says that if mmap fails, he drops back to stdio. This should only happen in the NFS case, the 2G case, etc. /warning [Never mind that it should be spending near 100% of its time in procline...that just means he's still got work to do... :-] I'd rather see it spending 100% of its time in regexec(), then I can just blame Henry Spencer :) Someone said there was new regex code out, is this true? Can anyone with a copy test grep with it? Jamie To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Thu, Jul 29, 1999 at 07:05:57PM -0400, James Howard wrote: warning type="Anything said here wrong is my fault, not DES's" DES tells me he has a new version (0.10) which mmap()s. It supposedly cuts the run time down significantly, I do not have the numbers in front of me. I do. Still far too slow. I'll work on this tomorrow, since that seems the only way to convince people that mmap is not such a big win. :-( Hmm... Maybe I'll even turn-out to be wrong. ;-) I really believe mmap falls into the category of "might be nice, but not necessary and does complicate things..." -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
:of me. Unfortunetly he has not posted this version yet so I cannot :download it and run it myself. He also says that if mmap fails, he drops :back to stdio. This should only happen in the NFS case, the 2G case, :etc. It should only be the 2G case or the pipe case. mmap() works just fine over NFS. I would not expect a huge speed increase using mmap over read. mmap() tends to be a lot harder on the system then read() (though we are working on that), especially if you are scanning large files. Avoiding buffer copies is good, but keep in mind that the cost of accessing a location in memory is essentially 0 if the memory is already in the L1 cache. So while you may get an improvement going from read() to mmap(), which avoids large buffer copies, you will not see much of an improvement removing redundancy from the line scan. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
James Howard scribbled this message on Jul 29: On Thu, 29 Jul 1999, Tim Vanderhoek wrote: fgetln() does a complete copy of the line buffer whenever an excessively long line is found. On this point, it's hard to do better without using mmap(), but mmap() has its own disadvantages. My last suggestion to James was to assume a worst case for long lines and mark the worst worst case with an XXX "this is unfortunate". warning type="Anything said here wrong is my fault, not DES's" DES tells me he has a new version (0.10) which mmap()s. It supposedly cuts the run time down significantly, I do not have the numbers in front of me. Unfortunetly he has not posted this version yet so I cannot download it and run it myself. He also says that if mmap fails, he drops back to stdio. This should only happen in the NFS case, the 2G case, etc. /warning [Never mind that it should be spending near 100% of its time in procline...that just means he's still got work to do... :-] I'd rather see it spending 100% of its time in regexec(), then I can just blame Henry Spencer :) Someone said there was new regex code out, is this true? Can anyone with a copy test grep with it? ok, I just made a patch to eliminate the copy that was happening in procfile, and it sped up a grep of a 5meg termcap from about 2.9sec down to .6 seconds... this includes time spent profiling the program.. GNU grep w/o profiling only takes .15sec so we ARE getting closer to GNU grep... it was VERY simple to do... and attached is the patch... this uses the option REG_STARTEND to do what the copy was trying to do... all of the code to use REG_STARTEND was already there, it just needed to be enabled.. enjoy! -- John-Mark Gurney Voice: +1 541 684 8449 Cu Networking P.O. Box 5693, 97405 "The soul contains in itself the event that shall presently befall it. The event is only the actualizing of its thought." -- Ralph Waldo Emerson diff -u grep-0.10.orig/util.c grep-0.10/util.c --- grep-0.10.orig/util.c Thu Jul 29 05:00:15 1999 +++ grep-0.10/util.cThu Jul 29 16:38:06 1999 @@ -93,7 +93,6 @@ file_t *f; str_t ln; int c, t, z; - char *tmp; if (fn == NULL) { fn = "(standard input)"; @@ -119,13 +118,8 @@ initqueue(); for (c = 0; !(lflag c);) { ln.off = grep_tell(f); - if ((tmp = grep_fgetln(f, ln.len)) == NULL) + if ((ln.dat = grep_fgetln(f, ln.len)) == NULL) break; - ln.dat = grep_malloc(ln.len + 1); - memcpy(ln.dat, tmp, ln.len); - ln.dat[ln.len] = 0; - if (ln.len 0 ln.dat[ln.len - 1] == '\n') - ln.dat[--ln.len] = 0; ln.line_no++; z = tail; @@ -133,7 +127,6 @@ enqueue(ln); linesqueued++; } - free(ln.dat); c += t; } if (Bflag 0) @@ -174,7 +167,8 @@ pmatch.rm_so = 0; pmatch.rm_eo = l-len; for (c = i = 0; i patterns; i++) { - r = regexec(r_pattern[i], l-dat, 0, pmatch, eflags); + r = regexec(r_pattern[i], l-dat, 0, pmatch, + eflags | REG_STARTEND); if (r == REG_NOMATCH t == 0) continue; if (wflag r == 0) {
Re: replacing grep(1)
Tim Vanderhoek wrote: On Thu, Jul 29, 1999 at 01:59:45AM +0900, Daniel C. Sobral wrote: Sorry, but a simplistic analysis like that just won't cut for grep. The algorithmic complexity is highly relevant here. Try this: Algorithmic complexity!?! Yup. It's a freaking grep application. There is no freaking algorithmic complexity. Pattern matching is one of the prime examples of algorithmic complexity. You can add complexity very trivially. At least not outside of our regex library, anyways. I had not looked at the source, so I didn't know exactly how the application did it's stuff. Now I did, and I'll comment. Let's say the number of patterns is N, and the total number of characters to be examined is S. Let's call the unmodified complexity C, just for the sake of simplifying comparision using a dangerous simplification. First, the new grep uses fgetln(). fgetln() searches for a new line. So each character is examined (at least) twice. That's C+S*read already. GNU Grep uses mmap() (or read(), but not in FreeBSD), so it doesn't incur in this additional complexity. Also, fgetln() will copy the line buffer from time to time, though that's not a simple computation, and probably of little significance. In addition to that, the new grep copies the fgrepln() result each time. Add S*copy to C. Next, the new grep tests the lines against each pattern separately! GNU grep doesn't. That's just *outside* the regexp library. Now, whether the complexity is inside or outside the regexp library, I don't care. It's complexity all the same. So it *must* be factored in. The test you suggested doesn't show anything about that algorithmic complexity, though. Yeah? Try to back that with the results of the tests I suggested. If we have a slow regex library, though, I would consider that a separate problem from a slow grep. If the f*cking grep is f*cking slow, I don't give a f*ck where the problem is located! It just *IS*. GNU grep uses gnu regexp library, the new grep uses our own. If changing greps means changing to a library whose algorithm complexity is greater, then that *DOES* count against the change. For instance, a quick browse over GNU greps shows the gnu regexp library can factor in multiple patterns. That is not being done by the new grep. Does our regexp library support that? Now, here is a quick and dirty fix for the repeated malloc()/free(). Notice that this is what fgetln() does, in fact. I'm afraid, though, that's this is not anywhere near what would be needed by far to put the new grep anywhere near the league of GNU grep. I like the idea of a readable code, I like the idea of a BSD license, but it would be damn silly to replace a clearly superior grep, and that's where the thing stands right now. --- util.c.orig Thu Jul 29 19:14:17 1999 +++ util.c Thu Jul 29 20:49:16 1999 @@ -107,6 +107,8 @@ ln.file = fn; ln.line_no = 0; + ln.bufsize = 81; /* Magical constants, yeah! */ + ln.dat = grep_malloc(81); linesqueued = 0; if (Bflag 0) @@ -115,11 +117,14 @@ ln.off = grep_tell(); if ((tmp = grep_getln(ln.len)) == NULL) break; - ln.dat = grep_malloc(ln.len + 1); + if (ln.bufsize ln.len + 1) + ln.dat = grep_realloc(ln.dat, ln.len + 1); memcpy(ln.dat, tmp, ln.len); - ln.dat[ln.len] = 0; if (ln.len 0 ln.dat[ln.len - 1] == '\n') ln.dat[--ln.len] = 0; + else + ln.dat[ln.len] = 0; + ln.line_no++; z = tail; @@ -127,9 +132,9 @@ enqueue(ln); linesqueued++; } - free(ln.dat); c += t; } + free(ln.dat); if (Bflag 0) clearqueue(); grep_close(); --- grep.h.orig Thu Jul 29 20:47:52 1999 +++ grep.h Thu Jul 29 20:48:34 1999 @@ -35,6 +35,7 @@ typedef struct { size_t len; + size_t bufsize; int line_no; int off; char*file; -- Daniel C. Sobral(8-DCS) d...@newsguy.com d...@freebsd.org Is it true that you're a millionaire's son who never worked a day in your life? Yeah, I guess so. Lemme tell you, son, you ain't missed a thing. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Thu, Jul 29, 1999 at 09:16:53PM +0900, Daniel C. Sobral wrote: Sorry, but a simplistic analysis like that just won't cut for grep. The algorithmic complexity is highly relevant here. Try this: Algorithmic complexity!?! Yup. I'm sorry. I've read your message and have decided that you're wrong. Outside of the regexp library, algorithmic complexity is not a factor here. It would take a beanbag to write anything other than an O(N) algorithm. The proposed grep is slow, very slow, and I've sent a long message to James outlining how to make it much faster, but algorithmic complexity is not an issue. Also, fgetln() will copy the line buffer from time to time, though that's not a simple computation, and probably of little fgetln() does a complete copy of the line buffer whenever an excessively long line is found. On this point, it's hard to do better without using mmap(), but mmap() has its own disadvantages. My last suggestion to James was to assume a worst case for long lines and mark the worst worst case with an XXX this is unfortunate. The test you suggested doesn't show anything about that algorithmic complexity, though. Yeah? Try to back that with the results of the tests I suggested. No, it's not even worth my time. Now look. You've gotten me so upset I actually went and did a simple test. The test showed I'm right and you're wrong. Catting X number of copies of /etc/termcap into longfile causes the time grep uses to pass longfile searching for all occurrences of printer causes it to use an extra 0.03 seconds for every repetition of /etc/termcap in longfile. Gee, linear complexity wrt to file length. Who could've guessed!? What'ya bet GNU grep also exhibits linear complexity? :) Admit it, you jumped in with some bullshit about complexity when had you taken the time to look into what James meant when he said it now spends 50% of its time in procline() you would have kept quiet, realizing that he was talking about a constant factor in the complexity analysis, an subject where comments such as it now spends 50% of its time in procline() are relevent. :-) [Never mind that it should be spending near 100% of its time in procline...that just means he's still got work to do... :-] -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Thu, 29 Jul 1999, Tim Vanderhoek wrote: fgetln() does a complete copy of the line buffer whenever an excessively long line is found. On this point, it's hard to do better without using mmap(), but mmap() has its own disadvantages. My last suggestion to James was to assume a worst case for long lines and mark the worst worst case with an XXX this is unfortunate. warning type=Anything said here wrong is my fault, not DES's DES tells me he has a new version (0.10) which mmap()s. It supposedly cuts the run time down significantly, I do not have the numbers in front of me. Unfortunetly he has not posted this version yet so I cannot download it and run it myself. He also says that if mmap fails, he drops back to stdio. This should only happen in the NFS case, the 2G case, etc. /warning [Never mind that it should be spending near 100% of its time in procline...that just means he's still got work to do... :-] I'd rather see it spending 100% of its time in regexec(), then I can just blame Henry Spencer :) Someone said there was new regex code out, is this true? Can anyone with a copy test grep with it? Jamie To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Thu, Jul 29, 1999 at 07:05:57PM -0400, James Howard wrote: warning type=Anything said here wrong is my fault, not DES's DES tells me he has a new version (0.10) which mmap()s. It supposedly cuts the run time down significantly, I do not have the numbers in front of me. I do. Still far too slow. I'll work on this tomorrow, since that seems the only way to convince people that mmap is not such a big win. :-( Hmm... Maybe I'll even turn-out to be wrong. ;-) I really believe mmap falls into the category of might be nice, but not necessary and does complicate things... -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
:of me. Unfortunetly he has not posted this version yet so I cannot :download it and run it myself. He also says that if mmap fails, he drops :back to stdio. This should only happen in the NFS case, the 2G case, :etc. It should only be the 2G case or the pipe case. mmap() works just fine over NFS. I would not expect a huge speed increase using mmap over read. mmap() tends to be a lot harder on the system then read() (though we are working on that), especially if you are scanning large files. Avoiding buffer copies is good, but keep in mind that the cost of accessing a location in memory is essentially 0 if the memory is already in the L1 cache. So while you may get an improvement going from read() to mmap(), which avoids large buffer copies, you will not see much of an improvement removing redundancy from the line scan. -Matt Matthew Dillon dil...@backplane.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
James Howard scribbled this message on Jul 29: On Thu, 29 Jul 1999, Tim Vanderhoek wrote: fgetln() does a complete copy of the line buffer whenever an excessively long line is found. On this point, it's hard to do better without using mmap(), but mmap() has its own disadvantages. My last suggestion to James was to assume a worst case for long lines and mark the worst worst case with an XXX this is unfortunate. warning type=Anything said here wrong is my fault, not DES's DES tells me he has a new version (0.10) which mmap()s. It supposedly cuts the run time down significantly, I do not have the numbers in front of me. Unfortunetly he has not posted this version yet so I cannot download it and run it myself. He also says that if mmap fails, he drops back to stdio. This should only happen in the NFS case, the 2G case, etc. /warning [Never mind that it should be spending near 100% of its time in procline...that just means he's still got work to do... :-] I'd rather see it spending 100% of its time in regexec(), then I can just blame Henry Spencer :) Someone said there was new regex code out, is this true? Can anyone with a copy test grep with it? ok, I just made a patch to eliminate the copy that was happening in procfile, and it sped up a grep of a 5meg termcap from about 2.9sec down to .6 seconds... this includes time spent profiling the program.. GNU grep w/o profiling only takes .15sec so we ARE getting closer to GNU grep... it was VERY simple to do... and attached is the patch... this uses the option REG_STARTEND to do what the copy was trying to do... all of the code to use REG_STARTEND was already there, it just needed to be enabled.. enjoy! -- John-Mark Gurney Voice: +1 541 684 8449 Cu Networking P.O. Box 5693, 97405 The soul contains in itself the event that shall presently befall it. The event is only the actualizing of its thought. -- Ralph Waldo Emerson diff -u grep-0.10.orig/util.c grep-0.10/util.c --- grep-0.10.orig/util.c Thu Jul 29 05:00:15 1999 +++ grep-0.10/util.cThu Jul 29 16:38:06 1999 @@ -93,7 +93,6 @@ file_t *f; str_t ln; int c, t, z; - char *tmp; if (fn == NULL) { fn = (standard input); @@ -119,13 +118,8 @@ initqueue(); for (c = 0; !(lflag c);) { ln.off = grep_tell(f); - if ((tmp = grep_fgetln(f, ln.len)) == NULL) + if ((ln.dat = grep_fgetln(f, ln.len)) == NULL) break; - ln.dat = grep_malloc(ln.len + 1); - memcpy(ln.dat, tmp, ln.len); - ln.dat[ln.len] = 0; - if (ln.len 0 ln.dat[ln.len - 1] == '\n') - ln.dat[--ln.len] = 0; ln.line_no++; z = tail; @@ -133,7 +127,6 @@ enqueue(ln); linesqueued++; } - free(ln.dat); c += t; } if (Bflag 0) @@ -174,7 +167,8 @@ pmatch.rm_so = 0; pmatch.rm_eo = l-len; for (c = i = 0; i patterns; i++) { - r = regexec(r_pattern[i], l-dat, 0, pmatch, eflags); + r = regexec(r_pattern[i], l-dat, 0, pmatch, + eflags | REG_STARTEND); if (r == REG_NOMATCH t == 0) continue; if (wflag r == 0) {
Re: replacing grep(1)
Tim Vanderhoek scribbled this message on Jul 29: On Thu, Jul 29, 1999 at 07:05:57PM -0400, James Howard wrote: warning type=Anything said here wrong is my fault, not DES's DES tells me he has a new version (0.10) which mmap()s. It supposedly cuts the run time down significantly, I do not have the numbers in front of me. I do. Still far too slow. I'll work on this tomorrow, since that seems the only way to convince people that mmap is not such a big win. :-( I just managed to get a five time speed increase by removing an uncessary copy... and now, grep spends 50% of it's time in regexc, 37.2% of it's time in mmfgetln, and this is because of the scanning for a new line character... Hmm... Maybe I'll even turn-out to be wrong. ;-) I really believe mmap falls into the category of might be nice, but not necessary and does complicate things... I think it is a big win... it shaved off around a half second from 3 seconds down to 2 and a half seconds... -- John-Mark Gurney Voice: +1 541 684 8449 Cu Networking P.O. Box 5693, 97405 The soul contains in itself the event that shall presently befall it. The event is only the actualizing of its thought. -- Ralph Waldo Emerson To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
Sheldon Hearn [EMAIL PROTECTED] writes: In this case, the implementation we'll be introducing will introduce a performance loss, not a gain. Can you document that? As far as stability goes, there's a loss involved _if_ passing the GNU grep regression tests is important. Do you mean that Jamie's implementation doesn't pass those regression tests? If they don't, we can fix it before importing it into the tree. DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Wed, Jul 28, 1999 at 03:30:58AM -0400, Dag-Erling Smorgrav wrote: There seems to be at least one dependency on GNU grep in /ports/Mk/bsd.port.mk where the -F argument is used. -F is implemented. I saw that, but had assumed the semantics were different. I should have read the read the manpages more closely: they're not. Sorry. -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
In message [EMAIL PROTECTED] "David O'Brien" writes: : Before importing, it must display a version number of 1.0 (or drop the : version number). This is not Linux where everything is version 0.xy. For a long time the new boot loader was in the tree with a version 0.xx... Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
I expect that there is a very good reason why this shouldn't be done, but could it be possible to implement two different algorithms/code dependant on the size of the file being grepped? Mark Dickey [EMAIL PROTECTED] Daniel C. Sobral wrote: James Howard wrote: Due to the discussion of speed, I have been looking at it and it is really slow. Even slower than I thought and I was thinking it was pretty slow. So using gprof, I have discovered that it seems to spend a whole mess of time in grep_malloc() and free(). So I pulled all the references to malloc inside the main loop (the copy for ln.dat and removed queueing). This stills leaves us with a grep that is about ~6x slower than GNU. Before that, it ran closer to 80x. After this, gprof says it spends around 53% of its time in procline(). Sorry, but a simplistic analysis like that just won't cut for grep. The algorithmic complexity is highly relevant here. Try this: generate a 1 Mb file, and then generate 10 Mb and 50 Mb files by concatenating that first file. Benchmark yours and GNU grep a number of times to get the average for each file. Now compare the *proportions* between the different sized files. Are they the same? Next, try different sized patterns on the 50 Mb file on both yours and GNU grep. Again, compare the proportion. Next, compare patterns with different number of "wildcards", patterns with things like [acegikmoqsuvxz] vs [acegikmoqsuvxzACEGIKMOQSUVXZ], etc. Either that, or do a complexity analysis of the algorithms. :-) (In case anyone reading this discussion wants to know more about complexity of algorithms, I recommend Computer Algorithms, Introduction to Design and Analysis, by Sara Baase, Addison Wesley.) -- Daniel C. Sobral (8-DCS) [EMAIL PROTECTED] [EMAIL PROTECTED] "Is it true that you're a millionaire's son who never worked a day in your life?" "Yeah, I guess so." "Lemme tell you, son, you ain't missed a thing." To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
Doug [EMAIL PROTECTED] wrote: The more complete the feature set, the better off we are for my money. Someone offering money? Quick, who's got the donations hat... :-) Peter To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Thu, Jul 29, 1999 at 01:59:45AM +0900, Daniel C. Sobral wrote: Sorry, but a simplistic analysis like that just won't cut for grep. The algorithmic complexity is highly relevant here. Try this: Algorithmic complexity!?! It's a freaking grep application. There is no freaking algorithmic complexity. At least not outside of our regex library, anyways. The test you suggested doesn't show anything about that algorithmic complexity, though. If we have a slow regex library, though, I would consider that a separate problem from a slow grep. -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
Brian F. Feldman gr...@freebsd.org writes: That's true. I'd like to see the replacement grep do mmaping of the input files if it doesn't already, as that would speed it up. Shouldn't be too hard to implement, the way file operations are abstracted. Patches? :) DES -- Dag-Erling Smorgrav - d...@yes.no To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
Sheldon Hearn sheld...@uunet.co.za writes: In this case, the implementation we'll be introducing will introduce a performance loss, not a gain. Can you document that? As far as stability goes, there's a loss involved _if_ passing the GNU grep regression tests is important. Do you mean that Jamie's implementation doesn't pass those regression tests? If they don't, we can fix it before importing it into the tree. DES -- Dag-Erling Smorgrav - d...@yes.no To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
Tim Vanderhoek vand...@ecf.utoronto.ca writes: Have you run your systems with J-grep as a replacement for GNU grep for a while (making sure nothing breaks)? Yes. There seems to be at least one dependency on GNU grep in /ports/Mk/bsd.port.mk where the -F argument is used. -F is implemented. DES -- Dag-Erling Smorgrav - d...@yes.no To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Wed, Jul 28, 1999 at 03:30:58AM -0400, Dag-Erling Smorgrav wrote: There seems to be at least one dependency on GNU grep in /ports/Mk/bsd.port.mk where the -F argument is used. -F is implemented. I saw that, but had assumed the semantics were different. I should have read the read the manpages more closely: they're not. Sorry. -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
In message 19990727214451.a66...@dragon.nuxi.com David O'Brien writes: : Before importing, it must display a version number of 1.0 (or drop the : version number). This is not Linux where everything is version 0.xy. For a long time the new boot loader was in the tree with a version 0.xx... Warner To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
James Howard wrote: Due to the discussion of speed, I have been looking at it and it is really slow. Even slower than I thought and I was thinking it was pretty slow. So using gprof, I have discovered that it seems to spend a whole mess of time in grep_malloc() and free(). So I pulled all the references to malloc inside the main loop (the copy for ln.dat and removed queueing). This stills leaves us with a grep that is about ~6x slower than GNU. Before that, it ran closer to 80x. After this, gprof says it spends around 53% of its time in procline(). Sorry, but a simplistic analysis like that just won't cut for grep. The algorithmic complexity is highly relevant here. Try this: generate a 1 Mb file, and then generate 10 Mb and 50 Mb files by concatenating that first file. Benchmark yours and GNU grep a number of times to get the average for each file. Now compare the *proportions* between the different sized files. Are they the same? Next, try different sized patterns on the 50 Mb file on both yours and GNU grep. Again, compare the proportion. Next, compare patterns with different number of wildcards, patterns with things like [acegikmoqsuvxz] vs [acegikmoqsuvxzACEGIKMOQSUVXZ], etc. Either that, or do a complexity analysis of the algorithms. :-) (In case anyone reading this discussion wants to know more about complexity of algorithms, I recommend Computer Algorithms, Introduction to Design and Analysis, by Sara Baase, Addison Wesley.) -- Daniel C. Sobral(8-DCS) d...@newsguy.com d...@freebsd.org Is it true that you're a millionaire's son who never worked a day in your life? Yeah, I guess so. Lemme tell you, son, you ain't missed a thing. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
I expect that there is a very good reason why this shouldn't be done, but could it be possible to implement two different algorithms/code dependant on the size of the file being grepped? Mark Dickey m...@bestweb.net Daniel C. Sobral wrote: James Howard wrote: Due to the discussion of speed, I have been looking at it and it is really slow. Even slower than I thought and I was thinking it was pretty slow. So using gprof, I have discovered that it seems to spend a whole mess of time in grep_malloc() and free(). So I pulled all the references to malloc inside the main loop (the copy for ln.dat and removed queueing). This stills leaves us with a grep that is about ~6x slower than GNU. Before that, it ran closer to 80x. After this, gprof says it spends around 53% of its time in procline(). Sorry, but a simplistic analysis like that just won't cut for grep. The algorithmic complexity is highly relevant here. Try this: generate a 1 Mb file, and then generate 10 Mb and 50 Mb files by concatenating that first file. Benchmark yours and GNU grep a number of times to get the average for each file. Now compare the *proportions* between the different sized files. Are they the same? Next, try different sized patterns on the 50 Mb file on both yours and GNU grep. Again, compare the proportion. Next, compare patterns with different number of wildcards, patterns with things like [acegikmoqsuvxz] vs [acegikmoqsuvxzACEGIKMOQSUVXZ], etc. Either that, or do a complexity analysis of the algorithms. :-) (In case anyone reading this discussion wants to know more about complexity of algorithms, I recommend Computer Algorithms, Introduction to Design and Analysis, by Sara Baase, Addison Wesley.) -- Daniel C. Sobral (8-DCS) d...@newsguy.com d...@freebsd.org Is it true that you're a millionaire's son who never worked a day in your life? Yeah, I guess so. Lemme tell you, son, you ain't missed a thing. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
Doug d...@gorean.org wrote: The more complete the feature set, the better off we are for my money. Someone offering money? Quick, who's got the donations hat... :-) Peter To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Thu, Jul 29, 1999 at 01:59:45AM +0900, Daniel C. Sobral wrote: Sorry, but a simplistic analysis like that just won't cut for grep. The algorithmic complexity is highly relevant here. Try this: Algorithmic complexity!?! It's a freaking grep application. There is no freaking algorithmic complexity. At least not outside of our regex library, anyways. The test you suggested doesn't show anything about that algorithmic complexity, though. If we have a slow regex library, though, I would consider that a separate problem from a slow grep. -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On 27 Jul 1999 13:37:35 +0200, Dag-Erling Smorgrav wrote: URL:http://www.freebsd.org/~des/software/grep-0.7.tar.gz I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. When I committed the port (textproc/freegrep), Jamie assured me that he'd keep me updated on the progress of his software. That was the last I heard of it, and the port is still sitting at version 0.3 . Version 0.3 broke port-building badly. Does version 0.7 make it through a build of a whole stack of ports? Ciao, Sheldon. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
Sheldon Hearn [EMAIL PROTECTED] writes: Version 0.3 broke port-building badly. Does version 0.7 make it through a build of a whole stack of ports? Yes. DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
It seems Dag-Erling Smorgrav wrote: Jamie Howard ([EMAIL PROTECTED]), with a little help from yours truly, has written a BSD-licensed version of grep(1) which has all the functionality of our current (GPLed) implementation, plus a little more, in one seventh the source code and one fourth the binary code. What's more, the code is actually possible for mere mortals to read and understand. The source code is available for download from freefall: URL:http://www.freebsd.org/~des/software/grep-0.7.tar.gz I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. Go for it, the more GNU stuff we nuke the better :) -Søren To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999, Soren Schmidt wrote: It seems Dag-Erling Smorgrav wrote: I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. Go for it, the more GNU stuff we nuke the better :) -Søren Geez, why don't we just write our own compiler and linker, assembler, and everything? Let's get every last bit of GNU out of our system, for no reason! This kind of NIH is not necessary, and only hurts us by misdirecting our energies. /joking Seriously, I'd love for this to happen. Most GNU software is a hopeless, gruesome mess that should be dragged out and shot. Getting rid of as much as possible, gradually, is a Very Good Thing; this is how we get stability and performance improvements. Brian Fundakowski Feldman _ __ ___ ___ ___ ___ [EMAIL PROTECTED] _ __ ___ | _ ) __| \ FreeBSD: The Power to Serve!_ __ | _ \._ \ |) | http://www.FreeBSD.org/ _ |___/___/___/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Tue, Jul 27, 1999 at 01:37:35PM +0200, Dag-Erling Smorgrav wrote: I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. Have you run your systems with J-grep as a replacement for GNU grep for a while (making sure nothing breaks)? There seems to be at least one dependency on GNU grep in /ports/Mk/bsd.port.mk where the -F argument is used. How's it compare in speed? [I'd test it myself, but see my private email...] -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999 08:19:38 -0400, "Brian F. Feldman" wrote: Getting rid of as much as possible, gradually, is a Very Good Thing; this is how we get stability and performance improvements. Only if the replacements are as stable and robust as their predecessors. In this case, the implementation we'll be introducing will introduce a performance loss, not a gain. As far as stability goes, there's a loss involved _if_ passing the GNU grep regression tests is important. Don't get me wrong. I'm all for replacing GNU software. Let's just be realistic and keep in mind that being non-GNU doesn't necessarily mean better. In this case, I'm all for the change, since I don't use grep for serious regex work and the readability gain outweighs any loss of performance. you probably feel the same way. Out opinions are those of developers, though. It's always worth remembering that. Ciao, Sheldon. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999, Nickolay N. Dudorov wrote: After making it on the CURRENT system I can only see: grep: filename: Undefined error: 0 for every filename. Every file? This caused by very "unusual" return values for 'grep_open' (and other '..._open') function which is declared as 'int' (and return int result) and compared with NULL ;-( I prefer not to include the patch for this because I am uncompatible with such trics as: return ((f = fopen(path, mode)) != NULL) - 1; This was done this way because the gzopen and fopen both return pointers of different types. Maybe the best thing would be to have grep_open() return a void pointer since procfile() doesn't keep track of what files are open and not. This is ugly and not very reusable, but then again how many programs need transparent access to both gzip'd and plaintext files? Jamie To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Tue, Jul 27, 1999 at 08:23:44AM -0400, Tim Vanderhoek wrote: How's it compare in speed? [I'd test it myself, but see my private email...] Okay, following-up on myself, and indirectly Sheldon, It does seem a little too slow. I'm not sure that this is because it doesn't use mmap. Supposedly the merged buffer/vm means mmap doesn't make as large a difference as it used to. On a file with 10+ lines, the speed difference is rather restrictive. Looking over the gprof output, I think its authors (or some other intrepid hacker) will find ways to speed it up. Only about 10% of the time is spend in procline(). There seems to be a lot of unnecessary strncpy() that could be _easily_ avoided if free() on util.c:130 was avoided, but I'll let the authors speak first. :-) -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999 23:18:14 +0900, "Daniel C. Sobral" wrote: I'm talking about cpdup, which can be found in http://www.backplane.com/FreeBSD/. Someone posted a port at the time, but I don't know if anyone ever committed the port. I'll commit a port in the next few days. Ciao, Sheldon. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
At 9:29 AM -0400 7/27/99, Tim Vanderhoek wrote: On a file with 10+ lines, the speed difference is rather restrictive. [...] Only about 10% of the time is spend in procline(). There seems to be a lot of unnecessary strncpy() that could be _easily_ avoided if free() on util.c:130 was avoided, but I'll let the authors speak first. :-) Hmm, strncpy? Are these calls which really want strncpy for what it was originally designed for, or are they just trying to prevent buffer overruns? If it's the buffer-overrun answer, then maybe this would be a good test case for using strlcpy instead of strncpy, and see if it makes a performance difference (since the code won't waste it's time nulling-out bytes that don't need to be nulled-out). --- Garance Alistair Drosehn = [EMAIL PROTECTED] Senior Systems Programmer or [EMAIL PROTECTED] Rensselaer Polytechnic Institute To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
Jamie Howard ([EMAIL PROTECTED]), with a little help from yours truly, has written a BSD-licensed version of grep(1) which has all the functionality of our current (GPLed) implementation, plus a little more, in one seventh the source code and one fourth the binary code. I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. A couple of general problems: o Too many diagnostics have "Undefined error: 0" appended. Particularly in the case of "err(2, re_error)" in file.c, you probably want to look at using errx() instead. o Errors other than "no match" need to return a exit status of 2: some in file.c and util.c are returning 1. A more general concern is whether Henry Spencer's regex routines -- at least in our present "alpha-quality" version -- are up to supporting a grep without much further debugging. I don't recall many of the problems I found when I last looked at these, though here are two, after 5 minutes playing: echo xx | grep '\(x\{1,2\}\)\1' echo x | grep '[--x]' -- Robert Nordier To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999, Brian F. Feldman wrote: On Tue, 27 Jul 1999, Soren Schmidt wrote: It seems Dag-Erling Smorgrav wrote: I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. Go for it, the more GNU stuff we nuke the better :) -Søren Geez, why don't we just write our own compiler and linker, assembler, and everything? Let's get every last bit of GNU out of our system, for no reason! This kind of NIH is not necessary, and only hurts us by misdirecting our energies. /joking Actually there is a difference between grep and gcc. you wouldn't ship cc on a binray -only embedded system. but you might want to ship grep (so that control scripts an use it). Seriously, I'd love for this to happen. Most GNU software is a hopeless, gruesome mess that should be dragged out and shot. Getting rid of as much as possible, gradually, is a Very Good Thing; this is how we get stability and performance improvements. Brian Fundakowski Feldman _ __ ___ ___ ___ ___ [EMAIL PROTECTED] _ __ ___ | _ ) __| \ FreeBSD: The Power to Serve!_ __ | _ \._ \ |) | http://www.FreeBSD.org/ _ |___/___/___/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On 27 Jul 1999, Dag-Erling Smorgrav wrote: I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. First, I'm all for this idea, and applaud you and Jamie for taking it on. I do have a few questions. Does POSIX say anything about grep, and if so, is this version compliant? Also, I'd like to put in another vote for full GNU grep feature compliance, since while having our own code is a good thing, I am against introducing gratuitous differences since I have enough of those to deal with already. I think ports building is a good test, but has anyone tested it with RCS yet? IIRC RCS is heavily dependant on GNU grep, diff and patch. I don't think CVS is dependant on external programs anymore though. I use grep heavily in day to day administration tasks so I look forward to giving this a try. Doug -- On account of being a democracy and run by the people, we are the only nation in the world that has to keep a government four years, no matter what it does. -- Will Rogers To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999, Doug wrote: First, I'm all for this idea, and applaud you and Jamie for taking it on. I do have a few questions. Does POSIX say anything about grep, and if so, is this version compliant? Also, I'd like to put in another vote for full GNU grep feature compliance, since while having our own code is a good thing, I am against introducing gratuitous differences since I have enough of those to deal with already. I do not have a copy of POSIX, but I do have Unix98 which is a superset of POSIX. Right now, excluding bugs, it is Unix 98 and therefore POSIX compliant except for -e. -e should permit multiple patterns and it never occured to me that anyone would want to do this. When used with -F, multiple patterns are accepted. I use grep heavily in day to day administration tasks so I look forward to giving this a try. Cool, d/l it and post a bug-list :) Jamie To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999, Jamie Howard wrote: I do not have a copy of POSIX, but I do have Unix98 which is a superset of POSIX. Right now, excluding bugs, it is Unix 98 and therefore POSIX compliant Good news, thanks for addressing this concern. except for -e. -e should permit multiple patterns and it never occured to me that anyone would want to do this. Ah, well, if the world were limited to just what I could imagine, how boring would that be? The more complete the feature set, the better off we are for my money. Doug -- On account of being a democracy and run by the people, we are the only nation in the world that has to keep a government four years, no matter what it does. -- Will Rogers To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999, Doug wrote: Ah, well, if the world were limited to just what I could imagine, how boring would that be? The more complete the feature set, the better off we are for my money. You misinterpretted, I didn't know you could do that therefore I didn't implement that. I certainly understand why you would want to :) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On 27 Jul 1999, Dag-Erling Smorgrav wrote: I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. Normally I don't post "me too" messages. I'll make an exception. Me too. -- - bill fumerola - [EMAIL PROTECTED] - BF1560 - computer horizons corp - - ph:(800) 252-2421 - [EMAIL PROTECTED] - [EMAIL PROTECTED] - To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999, James Howard wrote: On Tue, 27 Jul 1999, Doug wrote: Ah, well, if the world were limited to just what I could imagine, how boring would that be? The more complete the feature set, the better off we are for my money. You misinterpretted, I didn't know you could do that therefore I didn't implement that. I certainly understand why you would want to :) Ah, gotcha. Even better. :) Doug -- On account of being a democracy and run by the people, we are the only nation in the world that has to keep a government four years, no matter what it does. -- Will Rogers To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
$ uname -a $ grep foo NONEXIST Segmentation fault (core dumped) $ gdb /usr/bin/grep grep.core ... (no debugging symbols found)... Core was generated by `grep'. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/lib/libz.so.2...(no debugging symbols found)...done. Reading symbols from /usr/lib/libc.so.3...done. Reading symbols from /usr/libexec/ld-elf.so.1...done. #0 0x280a8538 in ftello (fp=0x0) at /FBSD/src/lib/libc/../libc/stdio/ftell.c:76 76 if (fp-_seek == NULL) { (gdb) where #0 0x280a8538 in ftello (fp=0x0) at /FBSD/src/lib/libc/../libc/stdio/ftell.c:76 #1 0x280a84e1 in ftell (fp=0x0) at /FBSD/src/lib/libc/../libc/stdio/ftell.c:59 #2 0x80490b7 in free () at /FBSD/src/lib/libc/../libc/stdlib/malloc.c:1089 #3 0x80499f1 in free () at /FBSD/src/lib/libc/../libc/stdlib/malloc.c:1089 #4 0x804968b in free () at /FBSD/src/lib/libc/../libc/stdlib/malloc.c:1089 #5 0x8048d3d in free () at /FBSD/src/lib/libc/../libc/stdlib/malloc.c:1089 (gdb) -- -- David([EMAIL PROTECTED] -or- [EMAIL PROTECTED]) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: replacing grep(1)
On 27 Jul 1999 13:37:35 +0200, Dag-Erling Smorgrav wrote: URL:http://www.freebsd.org/~des/software/grep-0.7.tar.gz I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. When I committed the port (textproc/freegrep), Jamie assured me that he'd keep me updated on the progress of his software. That was the last I heard of it, and the port is still sitting at version 0.3 . Version 0.3 broke port-building badly. Does version 0.7 make it through a build of a whole stack of ports? Ciao, Sheldon. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
Sheldon Hearn sheld...@uunet.co.za writes: Version 0.3 broke port-building badly. Does version 0.7 make it through a build of a whole stack of ports? Yes. DES -- Dag-Erling Smorgrav - d...@yes.no To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On 27 Jul 1999 13:48:21 +0200, Dag-Erling Smorgrav wrote: Version 0.3 broke port-building badly. Does version 0.7 make it through a build of a whole stack of ports? Yes. Excellent. I'll nuke the port once you've merged the new grep to STABLE. :-) Later, Sheldon. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
It seems Dag-Erling Smorgrav wrote: Jamie Howard (howar...@wam.umd.edu), with a little help from yours truly, has written a BSD-licensed version of grep(1) which has all the functionality of our current (GPLed) implementation, plus a little more, in one seventh the source code and one fourth the binary code. What's more, the code is actually possible for mere mortals to read and understand. The source code is available for download from freefall: URL:http://www.freebsd.org/~des/software/grep-0.7.tar.gz I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. Go for it, the more GNU stuff we nuke the better :) -Søren To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999, Soren Schmidt wrote: It seems Dag-Erling Smorgrav wrote: I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. Go for it, the more GNU stuff we nuke the better :) -S?ren Geez, why don't we just write our own compiler and linker, assembler, and everything? Let's get every last bit of GNU out of our system, for no reason! This kind of NIH is not necessary, and only hurts us by misdirecting our energies. /joking Seriously, I'd love for this to happen. Most GNU software is a hopeless, gruesome mess that should be dragged out and shot. Getting rid of as much as possible, gradually, is a Very Good Thing; this is how we get stability and performance improvements. Brian Fundakowski Feldman _ __ ___ ___ ___ ___ gr...@freebsd.org _ __ ___ | _ ) __| \ FreeBSD: The Power to Serve!_ __ | _ \._ \ |) | http://www.FreeBSD.org/ _ |___/___/___/ To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Tue, Jul 27, 1999 at 01:37:35PM +0200, Dag-Erling Smorgrav wrote: I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. Have you run your systems with J-grep as a replacement for GNU grep for a while (making sure nothing breaks)? There seems to be at least one dependency on GNU grep in /ports/Mk/bsd.port.mk where the -F argument is used. How's it compare in speed? [I'd test it myself, but see my private email...] -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999 08:19:38 -0400, Brian F. Feldman wrote: Getting rid of as much as possible, gradually, is a Very Good Thing; this is how we get stability and performance improvements. Only if the replacements are as stable and robust as their predecessors. In this case, the implementation we'll be introducing will introduce a performance loss, not a gain. As far as stability goes, there's a loss involved _if_ passing the GNU grep regression tests is important. Don't get me wrong. I'm all for replacing GNU software. Let's just be realistic and keep in mind that being non-GNU doesn't necessarily mean better. In this case, I'm all for the change, since I don't use grep for serious regex work and the readability gain outweighs any loss of performance. you probably feel the same way. Out opinions are those of developers, though. It's always worth remembering that. Ciao, Sheldon. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
In xzpd7xeb9xc@des.follo.net Dag-Erling Smorgrav d...@yes.no wrote: Jamie Howard (howar...@wam.umd.edu), with a little help from yours truly, has written a BSD-licensed version of grep(1) which has all the functionality of our current (GPLed) implementation, plus a little more, in one seventh the source code and one fourth the binary code. What's more, the code is actually possible for mere mortals to read and understand. The source code is available for download from freefall: URL:http://www.freebsd.org/~des/software/grep-0.7.tar.gz I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. Unfortunately abovementioned grep-0.7.tar.gz is broken. After making it on the CURRENT system I can only see: grep: filename: Undefined error: 0 for every filename. This caused by very unusual return values for 'grep_open' (and other '..._open') function which is declared as 'int' (and return int result) and compared with NULL ;-( I prefer not to include the patch for this because I am uncompatible with such trics as: return ((f = fopen(path, mode)) != NULL) - 1; N.Dudorov To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999, Sheldon Hearn wrote: In this case, I'm all for the change, since I don't use grep for serious regex work and the readability gain outweighs any loss of performance. you probably feel the same way. Out opinions are those of developers, though. It's always worth remembering that. Does any have numbers about how much slower the new grep is? I have been using the port (version 3) for my interactive grepping, and havedn't noticed a speed difference. I have been using it on zippy machines though, where 30% hit wouldn't be noticed. David Scheidt To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999 07:49:22 EST, David Scheidt wrote: Does any have numbers about how much slower the new grep is? Just by the way, if the latest version somehow uses mmap without my having noticed, then I've ontroduced a red herring. ;-) Version 0.3 certainly didn't use mmap. As I understand it, this means that the performance hit, whatever the magnitude, would be noticed with larger files. I've copied the author, who's probably in the best position to give you hard numbers. :-) Ciao, Sheldon. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999, Sheldon Hearn wrote: On Tue, 27 Jul 1999 08:19:38 -0400, Brian F. Feldman wrote: Getting rid of as much as possible, gradually, is a Very Good Thing; this is how we get stability and performance improvements. Only if the replacements are as stable and robust as their predecessors. Usually, when we get replacements, they are. In this case, the implementation we'll be introducing will introduce a performance loss, not a gain. As far as stability goes, there's a loss involved _if_ passing the GNU grep regression tests is important. Which it isn't unless they are truly correct in their assumptions of output behavior. Don't get me wrong. I'm all for replacing GNU software. Let's just be realistic and keep in mind that being non-GNU doesn't necessarily mean better. Not _necessarily_, but realistically... In this case, I'm all for the change, since I don't use grep for serious regex work and the readability gain outweighs any loss of performance. you probably feel the same way. Out opinions are those of developers, though. It's always worth remembering that. That's true. I'd like to see the replacement grep do mmaping of the input files if it doesn't already, as that would speed it up. Anyway, I haven't tried it out yet because I haven't seen it hit 1.0 :) The only good pre-1.0 software I've seen has been the GIMP, XRacer, and some little utilities (like a program called stat(1)). That reminds me. I'd like to see something like stat(1) go into the source tree, but only if it were freely licensed, not GPL-infected. I could do it in a day, I suppose, if it were worth it. Worth it is here defined as would be accepted to go in usr.bin. Ciao, Sheldon. Brian Fundakowski Feldman _ __ ___ ___ ___ ___ gr...@freebsd.org _ __ ___ | _ ) __| \ FreeBSD: The Power to Serve!_ __ | _ \._ \ |) | http://www.FreeBSD.org/ _ |___/___/___/ To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999, Nickolay N. Dudorov wrote: After making it on the CURRENT system I can only see: grep: filename: Undefined error: 0 for every filename. Every file? This caused by very unusual return values for 'grep_open' (and other '..._open') function which is declared as 'int' (and return int result) and compared with NULL ;-( I prefer not to include the patch for this because I am uncompatible with such trics as: return ((f = fopen(path, mode)) != NULL) - 1; This was done this way because the gzopen and fopen both return pointers of different types. Maybe the best thing would be to have grep_open() return a void pointer since procfile() doesn't keep track of what files are open and not. This is ugly and not very reusable, but then again how many programs need transparent access to both gzip'd and plaintext files? Jamie To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999, Brian F. Feldman wrote: That's true. I'd like to see the replacement grep do mmaping of the input files if it doesn't already, as that would speed it up. Anyway, It does not use mmap right now. And this causes a significant perforamce hit on larger files. An older version (I'm thinking .4) would give equivalent performance on smaller files, 75k or so, occassionally faster. However, larger files really drag it down, often slower by 900%. I haven't tried it out yet because I haven't seen it hit 1.0 :) The only good pre-1.0 software I've seen has been the GIMP, XRacer, and some little utilities (like a program called stat(1)). That reminds me. I'd like to see something like stat(1) go into the source tree, but only if it were freely licensed, not GPL-infected. I could do it in a day, I suppose, if it were worth it. Worth it is here defined as would be accepted to go in usr.bin. I once saw a version of stat that carried a public domain statement on an HP-UX software archive, I'll see if I can dig that up for you. Jamie To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Tue, Jul 27, 1999 at 08:23:44AM -0400, Tim Vanderhoek wrote: How's it compare in speed? [I'd test it myself, but see my private email...] Okay, following-up on myself, and indirectly Sheldon, It does seem a little too slow. I'm not sure that this is because it doesn't use mmap. Supposedly the merged buffer/vm means mmap doesn't make as large a difference as it used to. On a file with 10+ lines, the speed difference is rather restrictive. Looking over the gprof output, I think its authors (or some other intrepid hacker) will find ways to speed it up. Only about 10% of the time is spend in procline(). There seems to be a lot of unnecessary strncpy() that could be _easily_ avoided if free() on util.c:130 was avoided, but I'll let the authors speak first. :-) -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
Brian F. Feldman wrote: Geez, why don't we just write our own compiler and linker, assembler, and everything? Let's get every last bit of GNU out of our system, for no reason! This kind of NIH is not necessary, and only hurts us by misdirecting our energies. /joking Seriously, I'd love for this to happen. Most GNU software is a hopeless, gruesome mess that should be dragged out and shot. Getting rid of as much as possible, gradually, is a Very Good Thing; this is how we get stability and performance improvements. In fact, I think the *greatest* advantage of this code is it's readability. Anyway, both versions exist, so it's not a question of NIH. It's a question of choosing. -- Daniel C. Sobral(8-DCS) d...@newsguy.com d...@freebsd.org Is it true that you're a millionaire's son who never worked a day in your life? Yeah, I guess so. Lemme tell you, son, you ain't missed a thing. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
Dag-Erling Smorgrav wrote: Jamie Howard (howar...@wam.umd.edu), with a little help from yours truly, has written a BSD-licensed version of grep(1) which has all the functionality of our current (GPLed) implementation, plus a little more, in one seventh the source code and one fourth the binary code. What's more, the code is actually possible for mere mortals to read and understand. The source code is available for download from freefall: URL:http://www.freebsd.org/~des/software/grep-0.7.tar.gz I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. I'm concerned about performance. Grep performance is relevant to some. Now, while I don't care if this grep is slower than what we are using right now, I do care if it's _complexity_ is greater. So, please, could you make sure the algorithmic complexity is not greater, either by benchmark comparision, or by examining the code? I would do it, if I had time. But right now I don't, and there is no need to keep this waiting. -- Daniel C. Sobral(8-DCS) d...@newsguy.com d...@freebsd.org Is it true that you're a millionaire's son who never worked a day in your life? Yeah, I guess so. Lemme tell you, son, you ain't missed a thing. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
Brian F. Feldman wrote: That reminds me. I'd like to see something like stat(1) go into the source tree, but only if it were freely licensed, not GPL-infected. I could do it in a day, I suppose, if it were worth it. Worth it is here defined as would be accepted to go in usr.bin. May I discreetly open a can of worms and remind everyone of a very nice little utility one Matthew Dillon once offered for /bin? I still think it's worth, and, as I recall, I wasn't the only one. (In fact, I think I didn't even voice my opinion at the time...) I'm talking about cpdup, which can be found in http://www.backplane.com/FreeBSD/. Someone posted a port at the time, but I don't know if anyone ever committed the port. -- Daniel C. Sobral(8-DCS) d...@newsguy.com d...@freebsd.org Is it true that you're a millionaire's son who never worked a day in your life? Yeah, I guess so. Lemme tell you, son, you ain't missed a thing. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999 23:18:14 +0900, Daniel C. Sobral wrote: I'm talking about cpdup, which can be found in http://www.backplane.com/FreeBSD/. Someone posted a port at the time, but I don't know if anyone ever committed the port. I'll commit a port in the next few days. Ciao, Sheldon. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
At 9:29 AM -0400 7/27/99, Tim Vanderhoek wrote: On a file with 10+ lines, the speed difference is rather restrictive. [...] Only about 10% of the time is spend in procline(). There seems to be a lot of unnecessary strncpy() that could be _easily_ avoided if free() on util.c:130 was avoided, but I'll let the authors speak first. :-) Hmm, strncpy? Are these calls which really want strncpy for what it was originally designed for, or are they just trying to prevent buffer overruns? If it's the buffer-overrun answer, then maybe this would be a good test case for using strlcpy instead of strncpy, and see if it makes a performance difference (since the code won't waste it's time nulling-out bytes that don't need to be nulled-out). --- Garance Alistair Drosehn = g...@eclipse.acs.rpi.edu Senior Systems Programmer or dro...@rpi.edu Rensselaer Polytechnic Institute To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
Jamie Howard (howar...@wam.umd.edu), with a little help from yours truly, has written a BSD-licensed version of grep(1) which has all the functionality of our current (GPLed) implementation, plus a little more, in one seventh the source code and one fourth the binary code. I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. A couple of general problems: o Too many diagnostics have Undefined error: 0 appended. Particularly in the case of err(2, re_error) in file.c, you probably want to look at using errx() instead. o Errors other than no match need to return a exit status of 2: some in file.c and util.c are returning 1. A more general concern is whether Henry Spencer's regex routines -- at least in our present alpha-quality version -- are up to supporting a grep without much further debugging. I don't recall many of the problems I found when I last looked at these, though here are two, after 5 minutes playing: echo xx | grep '\(x\{1,2\}\)\1' echo x | grep '[--x]' -- Robert Nordier To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999, Brian F. Feldman wrote: On Tue, 27 Jul 1999, Soren Schmidt wrote: It seems Dag-Erling Smorgrav wrote: I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. Go for it, the more GNU stuff we nuke the better :) -Søren Geez, why don't we just write our own compiler and linker, assembler, and everything? Let's get every last bit of GNU out of our system, for no reason! This kind of NIH is not necessary, and only hurts us by misdirecting our energies. /joking Actually there is a difference between grep and gcc. you wouldn't ship cc on a binray -only embedded system. but you might want to ship grep (so that control scripts an use it). Seriously, I'd love for this to happen. Most GNU software is a hopeless, gruesome mess that should be dragged out and shot. Getting rid of as much as possible, gradually, is a Very Good Thing; this is how we get stability and performance improvements. Brian Fundakowski Feldman _ __ ___ ___ ___ ___ gr...@freebsd.org _ __ ___ | _ ) __| \ FreeBSD: The Power to Serve!_ __ | _ \._ \ |) | http://www.FreeBSD.org/ _ |___/___/___/ To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On 27 Jul 1999, Dag-Erling Smorgrav wrote: I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. First, I'm all for this idea, and applaud you and Jamie for taking it on. I do have a few questions. Does POSIX say anything about grep, and if so, is this version compliant? Also, I'd like to put in another vote for full GNU grep feature compliance, since while having our own code is a good thing, I am against introducing gratuitous differences since I have enough of those to deal with already. I think ports building is a good test, but has anyone tested it with RCS yet? IIRC RCS is heavily dependant on GNU grep, diff and patch. I don't think CVS is dependant on external programs anymore though. I use grep heavily in day to day administration tasks so I look forward to giving this a try. Doug -- On account of being a democracy and run by the people, we are the only nation in the world that has to keep a government four years, no matter what it does. -- Will Rogers To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On 1999-07-27 13:37:35 +0200, Dag-Erling Smorgrav wrote: Jamie Howard (howar...@wam.umd.edu), with a little help from yours truly, has written a BSD-licensed version of grep(1) which has all the functionality of our current (GPLed) implementation, plus a little more, in one seventh the source code and one fourth the binary code. What's more, the code is actually possible for mere mortals to read and understand. The source code is available for download from freefall: URL:http://www.freebsd.org/~des/software/grep-0.7.tar.gz I move that we replace GNU grep in our source tree with this implementation, once it's been reviewed by all concerned parties. It is 25 times slower than GNU grep ;-((( $ time /usr/bin/grep foobar /var/tmp/mailbox /dev/null 0.90 real 0.78 user 0.12 sys $ time /usr/local/bin/grep foobar /var/tmp/mailbox /dev/null 24.31 real22.36 user 1.69 sys (/var/tmp/mailbox is 81MB large). I often use grep for large data (in main memory). I don't care about the GNU license. I care about poor performance. -- Wolfram Schneider wo...@freebsd.org http://wolfram.schneider.org To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999, Doug wrote: First, I'm all for this idea, and applaud you and Jamie for taking it on. I do have a few questions. Does POSIX say anything about grep, and if so, is this version compliant? Also, I'd like to put in another vote for full GNU grep feature compliance, since while having our own code is a good thing, I am against introducing gratuitous differences since I have enough of those to deal with already. I do not have a copy of POSIX, but I do have Unix98 which is a superset of POSIX. Right now, excluding bugs, it is Unix 98 and therefore POSIX compliant except for -e. -e should permit multiple patterns and it never occured to me that anyone would want to do this. When used with -F, multiple patterns are accepted. I use grep heavily in day to day administration tasks so I look forward to giving this a try. Cool, d/l it and post a bug-list :) Jamie To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: replacing grep(1)
On Tue, 27 Jul 1999, Jamie Howard wrote: I do not have a copy of POSIX, but I do have Unix98 which is a superset of POSIX. Right now, excluding bugs, it is Unix 98 and therefore POSIX compliant Good news, thanks for addressing this concern. except for -e. -e should permit multiple patterns and it never occured to me that anyone would want to do this. Ah, well, if the world were limited to just what I could imagine, how boring would that be? The more complete the feature set, the better off we are for my money. Doug -- On account of being a democracy and run by the people, we are the only nation in the world that has to keep a government four years, no matter what it does. -- Will Rogers To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message