Re: Grepping a list of words
If I remember correctly, grep (and all its associated versions) accept -v as an option which reports the entries in the list that don't match. Using gref (which is given the name[s] of files) uses those files as a list of the patterns to match. Penny Stock Jumping 2000% Sign up to the #1 voted penny stock newsletter for free today! http://thirdpartyoffers.juno.com/TGL3141/4c6c7ff0a9b321d1bcbst06duc ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Grepping a list of words
From owner-freebsd-questi...@freebsd.org Thu Aug 12 05:36:27 2010 Date: Wed, 11 Aug 2010 18:00:22 -0500 To: freebsd-questions@freebsd.org From: Jack L. Stone ja...@sage-american.com Subject: Grepping a list of words Kindly appreciate help with how to grep (or similar) a list of words to determine if any of them are in a file rather than grepping one word at a time. Thanks for any suggestions... 1) egrep (word1|word2|word3|word4||wordN) file 2) grep -F -f wordlist_file sourcefile The proverbial advice about the fine manpage is relevant. :) ` ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Grepping a list of words
At 10:56 AM 8.12.2010 -0700, Chip Camden wrote: Quoth Anonymous on Thursday, 12 August 2010: Oliver Fromme o...@lurza.secnetix.de writes: John Levine jo...@iecc.com wrote: % egrep 'word1|word2|word3|...|wordn' filename.txt Thanks for the replies. This suggestion won't do the job as the list of words is very long, maybe 50-60. This is why I asked how to place them all in a file. One reply dealt with using a file with egrep. I'll try that. Gee, 50 words, that's about a 300 character pattern, that's not a problem for any shell or version of grep I know. But reading the words from a file is equivalent and as you note most likely easier to do. The question is what is more efficient. This might be important if that kind of grep command is run very often by a script, or if it's run on very large files. My guess is that one large regular expression is more efficient than many small ones. But I haven't done real benchmarks to prove this. BTW, not using regular expressions is even more efficient, e.g. $ fgrep -f /usr/share/dict/words /etc/group When using egrep(1) it takes considerably more time and memory. Having written a regex engine myself, I can see why. Though I'm sure egrep is highly optimized, even the most optimized DFA table is going to take more cycles to navigate than a simple string comparison. Not to mention the initial overhead of parsing the regex and building that table. -- Sterling (Chip) Camden| sterl...@camdensoftware.com | 2048D/3A978E4F Many thanks to all of the suggestions. I found this worked very well, ignoring concerns about use of resources: egrep -i -o -w -f word.file main.file The only thing it didn't do for me was the next step. My final objective was to really determine the words in the word.file that were not in the main.file. I figured finding matches would be easy and then could then run a sort|uniq comparison to determine the new words not yet in the main.file. Since I will have a need to run this check frequently, any suggestions for a better approach are welcome. Thanks again... Jack (^_^) Happy trails, Jack L. Stone System Admin Sage-american ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Grepping a list of words
On Friday 13 August 2010 15:47:38 Jack L. Stone wrote: The only thing it didn't do for me was the next step. My final objective was to really determine the words in the word.file that were not in the main.file. I figured finding matches would be easy and then could then run a sort|uniq comparison to determine the new words not yet in the main.file. Since I will have a need to run this check frequently, any suggestions for a better approach are welcome. sort -u and comm(1)? comm will compare two sorted files and produce up to three lists: of words only in file one, of words only in file 2 and of words common to both files. You can suppress any or all of the output lists. Jonathan ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Grepping a list of words
At 04:01 PM 8.13.2010 +0200, Jonathan McKeown wrote: On Friday 13 August 2010 15:47:38 Jack L. Stone wrote: The only thing it didn't do for me was the next step. My final objective was to really determine the words in the word.file that were not in the main.file. I figured finding matches would be easy and then could then run a sort|uniq comparison to determine the new words not yet in the main.file. Since I will have a need to run this check frequently, any suggestions for a better approach are welcome. sort -u and comm(1)? comm will compare two sorted files and produce up to three lists: of words only in file one, of words only in file 2 and of words common to both files. You can suppress any or all of the output lists. Jonathan ___ Jonathan: Thanks, I had forgotten about comm(1). Mehinks I am close to the solution to the whole issue now. Jack (^_^) Happy trails, Jack L. Stone System Admin Sage-american ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Grepping a list of words
Since I will have a need to run this check frequently, any suggestions for a better approach are welcome. sort -u and comm(1)? sort is O(N log N) while grep is O(N) Which is faster depends on the constant factors in each, but as the data sets get bigger, the log N term will dominate. That is, for small sets of data, I don't know which will be faster, but either will be fast enough so who cares. For large sets of data, the sort will be slow. R's, John ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Grepping a list of words
Kindly appreciate help with how to grep (or similar) a list of words to determine if any of them are in a file rather than grepping one word at a time. Thanks for any suggestions... All the best, Jack (^_^) Happy trails, Jack L. Stone System Admin Sage-american ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Grepping a list of words
On Wed, Aug 11, 2010 at 06:00:22PM -0500, Jack L. Stone wrote: Kindly appreciate help with how to grep (or similar) a list of words to determine if any of them are in a file rather than grepping one word at a time. Something like this should do the trick: egrep (word1|word2|word3) file Dan -- Daniel Bye _ ASCII ribbon campaign ( ) - against HTML, vCards and X - proprietary attachments in e-mail / \ pgplIzwvUzzGB.pgp Description: PGP signature
Re: Grepping a list of words
On Wed, Aug 11, 2010 at 06:00:22PM -0500, Jack L. Stone wrote: Kindly appreciate help with how to grep (or similar) a list of words to determine if any of them are in a file rather than grepping one word at a time. put the list in a file, and use grep -f better, use the \ and \ markers on the file's contents and use egrep. (grep -w option is likely to be buggy when available). -- Thomas E. Dickey http://invisible-island.net ftp://invisible-island.net pgpxMjjuiLfE9.pgp Description: PGP signature
Re: Grepping a list of words
Jack L. Stone ja...@sage-american.com writes: Kindly appreciate help with how to grep (or similar) a list of words to determine if any of them are in a file rather than grepping one word at a time. Perhaps, `-e' option? $ printf 'foo\nbar\n' | fgrep -e foo -e bar foo bar ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Grepping a list of words
On Wed, 11 Aug 2010 18:00:22 -0500 Jack L. Stone ja...@sage-american.com wrote: Kindly appreciate help with how to grep (or similar) a list of words to determine if any of them are in a file rather than grepping one word at a time. Use egrep egrep (word1|word2) file signature.asc Description: PGP signature
Re: Grepping a list of words
On 08/12/10 00:00, Jack L. Stone wrote: Kindly appreciate help with how to grep (or similar) a list of words to determine if any of them are in a file rather than grepping one word at a time. fgrep, aka grep -F A snippet from man grep: -F, --fixed-strings Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Grepping a list of words
Jack L Stone writes: Kindly appreciate help with how to grep (or similar) a list of words to determine if any of them are in a file rather than grepping one word at a time. #v+ % egrep 'word1|word2|word3|...|wordn' filename.txt #v- 'word1|word2|word3|...|wordn' is the regular expression, so if you can minimize it better for you :). HTH -- Ashish SHUKLA | GPG: F682 CDCC 39DC 0FEA E116 20B6 C746 CFA9 E74F A4B0 freebsd.org!ashish | http://people.freebsd.org/~ashish/ “There is nothing new to be discovered in physics now; All that remains is more and more precise measurement.” (Lord Kelvin, 1900) pgpU33zn9d8xF.pgp Description: PGP signature
Re: Grepping a list of words
At 05:14 PM 8.12.2010 +0530, Ashish SHUKLA wrote: Jack L Stone writes: Kindly appreciate help with how to grep (or similar) a list of words to determine if any of them are in a file rather than grepping one word at a time. #v+ % egrep 'word1|word2|word3|...|wordn' filename.txt #v- 'word1|word2|word3|...|wordn' is the regular expression, so if you can minimize it better for you :). HTH -- Ashish SHUKLA | GPG: F682 CDCC 39DC 0FEA E116 20B6 C746 CFA9 E74F A4B0 Thanks for the replies. This suggestion won't do the job as the list of words is very long, maybe 50-60. This is why I asked how to place them all in a file. One reply dealt with using a file with egrep. I'll try that. Appreciate the help and any others in case the one doesn't work. All the best, Jack (^_^) Happy trails, Jack L. Stone System Admin Sage-american ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Grepping a list of words
% egrep 'word1|word2|word3|...|wordn' filename.txt Thanks for the replies. This suggestion won't do the job as the list of words is very long, maybe 50-60. This is why I asked how to place them all in a file. One reply dealt with using a file with egrep. I'll try that. Gee, 50 words, that's about a 300 character pattern, that's not a problem for any shell or version of grep I know. But reading the words from a file is equivalent and as you note most likely easier to do. R's, John ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Grepping a list of words
John Levine jo...@iecc.com wrote: % egrep 'word1|word2|word3|...|wordn' filename.txt Thanks for the replies. This suggestion won't do the job as the list of words is very long, maybe 50-60. This is why I asked how to place them all in a file. One reply dealt with using a file with egrep. I'll try that. Gee, 50 words, that's about a 300 character pattern, that's not a problem for any shell or version of grep I know. But reading the words from a file is equivalent and as you note most likely easier to do. The question is what is more efficient. This might be important if that kind of grep command is run very often by a script, or if it's run on very large files. My guess is that one large regular expression is more efficient than many small ones. But I haven't done real benchmarks to prove this. Best regards Oliver -- Oliver Fromme, secnetix GmbH Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd To this day, many C programmers believe that 'strong typing' just means pounding extra hard on the keyboard. -- Peter van der Linden ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Grepping a list of words
Oliver Fromme o...@lurza.secnetix.de writes: John Levine jo...@iecc.com wrote: % egrep 'word1|word2|word3|...|wordn' filename.txt Thanks for the replies. This suggestion won't do the job as the list of words is very long, maybe 50-60. This is why I asked how to place them all in a file. One reply dealt with using a file with egrep. I'll try that. Gee, 50 words, that's about a 300 character pattern, that's not a problem for any shell or version of grep I know. But reading the words from a file is equivalent and as you note most likely easier to do. The question is what is more efficient. This might be important if that kind of grep command is run very often by a script, or if it's run on very large files. My guess is that one large regular expression is more efficient than many small ones. But I haven't done real benchmarks to prove this. BTW, not using regular expressions is even more efficient, e.g. $ fgrep -f /usr/share/dict/words /etc/group When using egrep(1) it takes considerably more time and memory. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Grepping a list of words
Quoth Anonymous on Thursday, 12 August 2010: Oliver Fromme o...@lurza.secnetix.de writes: John Levine jo...@iecc.com wrote: % egrep 'word1|word2|word3|...|wordn' filename.txt Thanks for the replies. This suggestion won't do the job as the list of words is very long, maybe 50-60. This is why I asked how to place them all in a file. One reply dealt with using a file with egrep. I'll try that. Gee, 50 words, that's about a 300 character pattern, that's not a problem for any shell or version of grep I know. But reading the words from a file is equivalent and as you note most likely easier to do. The question is what is more efficient. This might be important if that kind of grep command is run very often by a script, or if it's run on very large files. My guess is that one large regular expression is more efficient than many small ones. But I haven't done real benchmarks to prove this. BTW, not using regular expressions is even more efficient, e.g. $ fgrep -f /usr/share/dict/words /etc/group When using egrep(1) it takes considerably more time and memory. Having written a regex engine myself, I can see why. Though I'm sure egrep is highly optimized, even the most optimized DFA table is going to take more cycles to navigate than a simple string comparison. Not to mention the initial overhead of parsing the regex and building that table. -- Sterling (Chip) Camden| sterl...@camdensoftware.com | 2048D/3A978E4F http://camdensoftware.com | http://chipstips.com| http://chipsquips.com pgpFtngDyRM8G.pgp Description: PGP signature
Re: Grepping a list of words
Gee, 50 words, that's about a 300 character pattern, that's not a problem for any shell or version of grep I know. But reading the words from a file is equivalent and as you note most likely easier to do. The question is what is more efficient. This might be important if that kind of grep command is run very often by a script, or if it's run on very large files. It's exactly the same, since it's the same program using the same search algorithm. The only thing that's different is the input language for the pattern. What looks like a bunch of separate patterns in the input file is internally turned into one pattern that is then compiled into a state machine that it uses to match the input. R's, John ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org