Re: [newbie] Number of occurrences of each word in a text file

Todd Slater Thu, 19 Aug 2004 06:56:35 -0700

On Thu, Aug 19, 2004 at 07:37:25AM -0600, Russ Kepler wrote:
> On Thursday 19 August 2004 06:45 am, Todd Slater wrote:
> 
> > 2. Get unique words to count from masterwordlist.
> >
> >    uniq masterwordlist > uniqwords
> >
> > 3. Count the number of times a word in uniqwords appears in
> >    masterwordlist.
> >
> >    for line in `cat uniqwords` ; do echo $line : `grep -c $line
> >    masterwordlist` >> countedwords ; done
> >
> >    (that should all be one line)
> 
> You might consider combining these steps with "uniq -c "


Ah, yes! Much easier, thank you!
 
> > There's probably more n better ways to do it, but that should work.
> > Modify to suit your needs, like if you want to distinguish between A and
> > a.
> 
> Toss a 'tr [A-Z] [a-z]' into the mix for that.  You end up with something like 
> this:
> 
> tr ' \011' '\012\012' < text.txt | tr [A-Z] [a-z] | sort | uniq -c

I did that in step 1. I suppose you could also just use the -i switch to
uniq to ignore case, so we might end up with

tr ' \011' '\012\012' < text.txt | tr -d [:punct:] | sort | uniq -ic

Now, how to sort the results of that so it's as we would expect, i.e. 1,
2, 3, 4, etc. instead of the computer way?

Todd

____________________________________________________
Want to buy your Pack or Services from MandrakeSoft? 
Go to http://www.mandrakestore.com
Join the Club : http://www.mandrakeclub.com
____________________________________________________

Re: [newbie] Number of occurrences of each word in a text file

Reply via email to