Re: Grepping a list of words

2010-08-18 Thread gs_stol...@juno.com
If I remember correctly,  grep  (and all its associated versions) accept -v 
as an option which reports the entries in the list that don't match.  Using  
gref  (which is given the name[s] of files) uses those files as a list of the 
patterns to match.

Penny Stock Jumping 2000%
Sign up to the #1 voted penny stock newsletter for free today!
http://thirdpartyoffers.juno.com/TGL3141/4c6c7ff0a9b321d1bcbst06duc
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Grepping a list of words

2010-08-14 Thread Robert Bonomi
 From owner-freebsd-questi...@freebsd.org  Thu Aug 12 05:36:27 2010
 Date: Wed, 11 Aug 2010 18:00:22 -0500
 To: freebsd-questions@freebsd.org
 From: Jack L. Stone ja...@sage-american.com
 Subject: Grepping a list of words

 Kindly appreciate help with how to grep (or similar) a list of words to
 determine if any of them are in a file rather than grepping one word at a
 time.

 Thanks for any suggestions...


1)  egrep (word1|word2|word3|word4||wordN) file


2)  grep -F -f wordlist_file  sourcefile



The proverbial advice about the fine manpage is relevant. :)

`

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Grepping a list of words

2010-08-13 Thread Jack L. Stone
At 10:56 AM 8.12.2010 -0700, Chip Camden wrote:
Quoth Anonymous on Thursday, 12 August 2010:
 Oliver Fromme o...@lurza.secnetix.de writes:
 
  John Levine jo...@iecc.com wrote:
  % egrep 'word1|word2|word3|...|wordn' filename.txt

 Thanks for the replies. This suggestion won't do the job as the
list of
 words is very long, maybe 50-60. This is why I asked how to place
them all
 in a file. One reply dealt with using a file with egrep. I'll try
that.

Gee, 50 words, that's about a 300 character pattern, that's not a
problem
for any shell or version of grep I know.

But reading the words from a file is equivalent and as you note most
likely easier to do.
 
  The question is what is more efficient.  This might be
  important if that kind of grep command is run very often
  by a script, or if it's run on very large files.
 
  My guess is that one large regular expression is more
  efficient than many small ones.  But I haven't done real
  benchmarks to prove this.
 
 BTW, not using regular expressions is even more efficient, e.g.
 
   $ fgrep -f /usr/share/dict/words /etc/group
 
 When using egrep(1) it takes considerably more time and memory.

Having written a regex engine myself, I can see why.  Though I'm sure
egrep is highly optimized, even the most optimized DFA table is going to
take more
cycles to navigate than a simple string comparison.  Not to mention the
initial overhead of parsing the regex and building that table.

-- 
Sterling (Chip) Camden| sterl...@camdensoftware.com | 2048D/3A978E4F

Many thanks to all of the suggestions. I found this worked very well,
ignoring concerns about use of resources:

egrep -i -o -w -f word.file main.file

The only thing it didn't do for me was the next step. My final objective
was to really determine the words in the word.file that were not in the
main.file. I figured finding matches would be easy and then could then
run a sort|uniq comparison to determine the new words not yet in the
main.file.

Since I will have a need to run this check frequently, any suggestions for
a better approach are welcome.

Thanks again...

Jack

(^_^)
Happy trails,
Jack L. Stone

System Admin
Sage-american
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Grepping a list of words

2010-08-13 Thread Jonathan McKeown
On Friday 13 August 2010 15:47:38 Jack L. Stone wrote:

 The only thing it didn't do for me was the next step. My final objective
 was to really determine the words in the word.file that were not in the
 main.file. I figured finding matches would be easy and then could then
 run a sort|uniq comparison to determine the new words not yet in the
 main.file.

 Since I will have a need to run this check frequently, any suggestions for
 a better approach are welcome.

sort -u and comm(1)?

comm will compare two sorted files and produce up to three lists: of words 
only in file one, of words only in file 2 and of words common to both files. 
You can suppress any or all of the output lists.

Jonathan
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Grepping a list of words

2010-08-13 Thread Jack L. Stone
At 04:01 PM 8.13.2010 +0200, Jonathan McKeown wrote:
On Friday 13 August 2010 15:47:38 Jack L. Stone wrote:

 The only thing it didn't do for me was the next step. My final objective
 was to really determine the words in the word.file that were not in the
 main.file. I figured finding matches would be easy and then could then
 run a sort|uniq comparison to determine the new words not yet in the
 main.file.

 Since I will have a need to run this check frequently, any suggestions for
 a better approach are welcome.

sort -u and comm(1)?

comm will compare two sorted files and produce up to three lists: of words 
only in file one, of words only in file 2 and of words common to both files. 
You can suppress any or all of the output lists.

Jonathan
___

Jonathan:

Thanks, I had forgotten about comm(1). Mehinks I am close to the solution
to the whole issue now.

Jack

(^_^)
Happy trails,
Jack L. Stone

System Admin
Sage-american
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Grepping a list of words

2010-08-13 Thread John Levine
 Since I will have a need to run this check frequently, any suggestions for
 a better approach are welcome.

sort -u and comm(1)?

sort is O(N log N) while grep is O(N)

Which is faster depends on the constant factors in each, but as the
data sets get bigger, the log N term will dominate.  That is, for
small sets of data, I don't know which will be faster, but either will
be fast enough so who cares.  For large sets of data, the sort will be
slow.

R's,
John
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Grepping a list of words

2010-08-12 Thread Jack L. Stone
Kindly appreciate help with how to grep (or similar) a list of words to
determine if any of them are in a file rather than grepping one word at a
time.

Thanks for any suggestions...

All the best,
Jack

(^_^)
Happy trails,
Jack L. Stone

System Admin
Sage-american
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Grepping a list of words

2010-08-12 Thread Daniel Bye
On Wed, Aug 11, 2010 at 06:00:22PM -0500, Jack L. Stone wrote:
 Kindly appreciate help with how to grep (or similar) a list of words to
 determine if any of them are in a file rather than grepping one word at a
 time.

Something like this should do the trick:

egrep (word1|word2|word3) file

Dan

-- 
Daniel Bye
 _
  ASCII ribbon campaign ( )
 - against HTML, vCards and  X
- proprietary attachments in e-mail / \


pgplIzwvUzzGB.pgp
Description: PGP signature


Re: Grepping a list of words

2010-08-12 Thread Thomas Dickey
On Wed, Aug 11, 2010 at 06:00:22PM -0500, Jack L. Stone wrote:
 Kindly appreciate help with how to grep (or similar) a list of words to
 determine if any of them are in a file rather than grepping one word at a
 time.

put the list in a file, and use grep -f

better, use the \ and \ markers on the file's contents and use egrep.

(grep -w option is likely to be buggy when available).

-- 
Thomas E. Dickey
http://invisible-island.net
ftp://invisible-island.net


pgpxMjjuiLfE9.pgp
Description: PGP signature


Re: Grepping a list of words

2010-08-12 Thread Anonymous
Jack L. Stone ja...@sage-american.com writes:

 Kindly appreciate help with how to grep (or similar) a list of words to
 determine if any of them are in a file rather than grepping one word at a
 time.

Perhaps, `-e' option?

  $ printf 'foo\nbar\n' | fgrep -e foo -e bar
  foo
  bar
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Grepping a list of words

2010-08-12 Thread Rodrigo Gonzalez
On Wed, 11 Aug 2010 18:00:22 -0500
Jack L. Stone ja...@sage-american.com wrote:

 Kindly appreciate help with how to grep (or similar) a list of words
 to determine if any of them are in a file rather than grepping one
 word at a time.
 

Use egrep

egrep (word1|word2) file


signature.asc
Description: PGP signature


Re: Grepping a list of words

2010-08-12 Thread Arthur Chance

On 08/12/10 00:00, Jack L. Stone wrote:

Kindly appreciate help with how to grep (or similar) a list of words to
determine if any of them are in a file rather than grepping one word at a
time.


fgrep, aka grep -F

A snippet from man grep:

   -F, --fixed-strings
  Interpret  PATTERN as a list of fixed strings, separated
  by newlines, any of which is to be matched.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Grepping a list of words

2010-08-12 Thread Ashish SHUKLA
Jack L Stone writes:
 Kindly appreciate help with how to grep (or similar) a list of words to
 determine if any of them are in a file rather than grepping one word at a
 time.

#v+
% egrep 'word1|word2|word3|...|wordn' filename.txt
#v-

'word1|word2|word3|...|wordn' is the regular expression, so if you can
minimize it better for you :).

HTH
-- 
Ashish SHUKLA  | GPG: F682 CDCC 39DC 0FEA E116  20B6 C746 CFA9 E74F A4B0
freebsd.org!ashish | http://people.freebsd.org/~ashish/

“There is nothing new to be discovered in physics now; All that
remains is more and more precise measurement.” (Lord Kelvin, 1900)


pgpU33zn9d8xF.pgp
Description: PGP signature


Re: Grepping a list of words

2010-08-12 Thread Jack L. Stone
At 05:14 PM 8.12.2010 +0530, Ashish SHUKLA wrote:
Jack L Stone writes:
 Kindly appreciate help with how to grep (or similar) a list of words to
 determine if any of them are in a file rather than grepping one word at a
 time.

#v+
% egrep 'word1|word2|word3|...|wordn' filename.txt
#v-

'word1|word2|word3|...|wordn' is the regular expression, so if you can
minimize it better for you :).

HTH
-- 
Ashish SHUKLA  | GPG: F682 CDCC 39DC 0FEA E116  20B6 C746 CFA9 E74F A4B0

Thanks for the replies. This suggestion won't do the job as the list of
words is very long, maybe 50-60. This is why I asked how to place them all
in a file. One reply dealt with using a file with egrep. I'll try that.

Appreciate the help and any others in case the one doesn't work.

All the best,
Jack

(^_^)
Happy trails,
Jack L. Stone

System Admin
Sage-american
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Grepping a list of words

2010-08-12 Thread John Levine
% egrep 'word1|word2|word3|...|wordn' filename.txt

Thanks for the replies. This suggestion won't do the job as the list of
words is very long, maybe 50-60. This is why I asked how to place them all
in a file. One reply dealt with using a file with egrep. I'll try that.

Gee, 50 words, that's about a 300 character pattern, that's not a problem
for any shell or version of grep I know.

But reading the words from a file is equivalent and as you note most
likely easier to do.

R's,
John
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Grepping a list of words

2010-08-12 Thread Oliver Fromme
John Levine jo...@iecc.com wrote:
% egrep 'word1|word2|word3|...|wordn' filename.txt
  
   Thanks for the replies. This suggestion won't do the job as the list of
   words is very long, maybe 50-60. This is why I asked how to place them all
   in a file. One reply dealt with using a file with egrep. I'll try that.
  
  Gee, 50 words, that's about a 300 character pattern, that's not a problem
  for any shell or version of grep I know.
  
  But reading the words from a file is equivalent and as you note most
  likely easier to do.

The question is what is more efficient.  This might be
important if that kind of grep command is run very often
by a script, or if it's run on very large files.

My guess is that one large regular expression is more
efficient than many small ones.  But I haven't done real
benchmarks to prove this.

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH  Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

To this day, many C programmers believe that 'strong typing'
just means pounding extra hard on the keyboard.
-- Peter van der Linden
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Grepping a list of words

2010-08-12 Thread Anonymous
Oliver Fromme o...@lurza.secnetix.de writes:

 John Levine jo...@iecc.com wrote:
 % egrep 'word1|word2|word3|...|wordn' filename.txt
   
Thanks for the replies. This suggestion won't do the job as the list of
words is very long, maybe 50-60. This is why I asked how to place them 
 all
in a file. One reply dealt with using a file with egrep. I'll try that.
   
   Gee, 50 words, that's about a 300 character pattern, that's not a problem
   for any shell or version of grep I know.
   
   But reading the words from a file is equivalent and as you note most
   likely easier to do.

 The question is what is more efficient.  This might be
 important if that kind of grep command is run very often
 by a script, or if it's run on very large files.

 My guess is that one large regular expression is more
 efficient than many small ones.  But I haven't done real
 benchmarks to prove this.

BTW, not using regular expressions is even more efficient, e.g.

  $ fgrep -f /usr/share/dict/words /etc/group

When using egrep(1) it takes considerably more time and memory.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Grepping a list of words

2010-08-12 Thread Chip Camden
Quoth Anonymous on Thursday, 12 August 2010:
 Oliver Fromme o...@lurza.secnetix.de writes:
 
  John Levine jo...@iecc.com wrote:
  % egrep 'word1|word2|word3|...|wordn' filename.txt

 Thanks for the replies. This suggestion won't do the job as the list of
 words is very long, maybe 50-60. This is why I asked how to place them 
  all
 in a file. One reply dealt with using a file with egrep. I'll try that.

Gee, 50 words, that's about a 300 character pattern, that's not a problem
for any shell or version of grep I know.

But reading the words from a file is equivalent and as you note most
likely easier to do.
 
  The question is what is more efficient.  This might be
  important if that kind of grep command is run very often
  by a script, or if it's run on very large files.
 
  My guess is that one large regular expression is more
  efficient than many small ones.  But I haven't done real
  benchmarks to prove this.
 
 BTW, not using regular expressions is even more efficient, e.g.
 
   $ fgrep -f /usr/share/dict/words /etc/group
 
 When using egrep(1) it takes considerably more time and memory.

Having written a regex engine myself, I can see why.  Though I'm sure
egrep is highly optimized, even the most optimized DFA table is going to take 
more
cycles to navigate than a simple string comparison.  Not to mention the
initial overhead of parsing the regex and building that table.

-- 
Sterling (Chip) Camden| sterl...@camdensoftware.com | 2048D/3A978E4F
http://camdensoftware.com | http://chipstips.com| http://chipsquips.com


pgpFtngDyRM8G.pgp
Description: PGP signature


Re: Grepping a list of words

2010-08-12 Thread John R. Levine

 Gee, 50 words, that's about a 300 character pattern, that's not a problem
 for any shell or version of grep I know.

 But reading the words from a file is equivalent and as you note most
 likely easier to do.

The question is what is more efficient.  This might be
important if that kind of grep command is run very often
by a script, or if it's run on very large files.


It's exactly the same, since it's the same program using the same search 
algorithm.  The only thing that's different is the input language for the 
pattern.  What looks like a bunch of separate patterns in the input file 
is internally turned into one pattern that is then compiled into a state 
machine that it uses to match the input.


R's,
John
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org