Hi Everybody,

I'm working through the "Mastering Regular Expressions" book right now, so 
after reading the question earlier this week about finding the number of 
unique words in some input and the really great responses everybody sent, I 
wondered if I couldn't think of another way to do this (since "there's always 
more than one way . . . " and I really am a regex beginner I thought it would 
be good practice).  Anyway, I did come up with another way (I think), and I 
was just hoping for some feedback about why it might not work or why it would 
be a bad approach.  Here's the code I came up with:

my $matches; ## running with "use strict;"
while (<>) {
        while (/(\w+)/ig) {
                $matches!~ /$1/i ? $matches.="$1 " : next;
        }
}
$unique_words=()=$matches=~ /\s/g;

Thanks in advance for any comments!

-Dave

>>At 08:59 AM 1/22/02 -0600, Booher Timothy B 1stLt AFRL/MNAC wrote:
>>I am trying to write a perl script to count the words (not counting
>>duplicates) in a file based on the following definition of word:
>>
>>"A word is any collection of characters seperated by white space or
>>punctuation characters such as {.!?,}"
>>
>>I have a lot of ideas, but also the suspicion that someone else has done
>>this before. Here is my basic approach.
>>
>>--> create two-dimensional array with following axes {x = word.length, y =
>>word.string}
>>--> read line
>>         --> read first word
>>         --> compare word against entire column of similiar sized words
>>                 if found then promote word one higher in column
>>                 else add word to the bottom of the column and increment
>>
>>word count
>>
>>Any ideas on a more efficient approach -- anything else out there that does
>>this?


>Whoa, sounds like someone hasn't met hashes yet.
>
>Hashes are the first coolest thing you encounter when learning Perl (unless 
>you've come from awk, which I don't think you have).
>
>If we accept the set of word characters as being defined by \w, your 
>problem can be solved with this code:
>
>         my %word;
>         while (<>) {
>           $word{$_}++ for /(\w+)/g;
>         }
>
>Somewhat simpler than you were imagining?  Here's how it works:
>
>         my %word;
>
>Declare hash (since the code is going to run with "use strict").
>
>         while (<>) {
>
>While we can read a line from either files named on the command line or 
>standard input, put the line into the variable $_
>
>           for /(\w+)/g;
>
>Loop over all groups of consecutive word characters in $_, putting each one 
>into a temporary $_
>
>         $word{$_}++
>
>Increment the count stored in the hash corresponding to that word.  If 
>there isn't one there yet, create one with an initial value of 0, then add 
>1 to it.
>
>After the end of the loop you can dump the concordance with something like:
>
>         print "$_: $word{$_}\n" for sort keys %word;
>
>
>
>--
>Peter Scott
>Pacific Systems Design Technologies
>http://www.perldebugged.com

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to