Repeated Words

2001-04-25 Thread Helio S. Junior

Hello,

How do i read a simple file and look for "repeated
words" in each line, writing out the words i have
found and the numbers of line they were found?

eg:
File ==> Test.Dat
sample line of text.
this line follows another line.
This is the last line.

The program should report:

Repeated Word(s): 'line'  on Line 2.


Any Sample code?

TIA,

H3li0


__
Do You Yahoo!?
Yahoo! Auctions - buy the things you want at great prices
http://auctions.yahoo.com/



Re: Repeated Words

2001-04-25 Thread Kevin Meltzer

Hi Helio,

A similar question came up on the list. Please refer to the thread
http://archive.develooper.com/beginners%40perl.org/msg00334.html and you should
find an (at least good portion of) answer to your question.

Cheers,
Kevin

On Wed, Apr 25, 2001 at 04:22:17AM -0700, Helio S. Junior ([EMAIL PROTECTED]) 
spew-ed forth:
> Hello,
> 
> How do i read a simple file and look for "repeated
> words" in each line, writing out the words i have
> found and the numbers of line they were found?
> 
> eg:
> File ==> Test.Dat
> sample line of text.
> this line follows another line.
> This is the last line.
> 

Writing CGI Applications with Perl - http://perlcgi-book.com
-- 
"Families is where out nation finds hope, where wings take dream."
-- G.W. Bush, LaCrosse, WI 10/18/2000



Re: Repeated Words

2001-04-25 Thread Michael Lamertz

Helio S. Junior ([EMAIL PROTECTED]) wrote:
> Hello,
> 
> How do i read a simple file and look for "repeated
> words" in each line, writing out the words i have
> found and the numbers of line they were found?

Well, that sounds like a homework to me, thus no sample code...

Think about using either a hash or an array for storing the previous
line's words.  When using a hash, you can easily look up if the word in
the current line has been there and report it.  When you use an array,
do a 'perldoc -f grep' to find out how to look for words in there.

Do a 'perldoc perlvar' to find out how to get the current input's
line number.

Mike

-- 
 If we fail, we will lose the war.

Michael Lamertz  | [EMAIL PROTECTED] / [EMAIL PROTECTED]
Nordstr. 49  | http://www.lamertz.net
50733 Cologne| Work: +49 221 3091-121
Germany  | Priv: +49 221 445420 / +49 171 6900 310



Re: Repeated Words

2001-04-25 Thread Paul


--- "Helio S. Junior" <[EMAIL PROTECTED]> wrote:
> Hello,

Hi =o)

> How do i read a simple file and look for "repeated
> words" in each line, writing out the words i have
> found and the numbers of line they were found?
> 
> eg:
> File ==> Test.Dat
> sample line of text.
> this line follows another line.
> This is the last line.
> 
> The program should report:
> 
> Repeated Word(s): 'line'  on Line 2.

I was tempted to use something like 
 $line =~ /(\b\w+\b).*\1/o;

to find repeats, but don't -- it only reads the first repeated word,
and gets more complex to fix after that.

Instead, try doing it manually:


open DAT, "Test.dat" or die $!;
my $ln = 0;
foreach my $line () {
   chomp $line;
   $ln++;
   my %hit = ();
   foreach my $word (split /\W+/o, $line) { $hit{$word}++ }
   foreach my $word (keys %hit) {
  print "Repeated Word(s): '$word'  on Line $ln.\n"
  if $hit{$word} > 1;
   }
}
close DAT;

=Test.dat
sample line of text.
this line follows another line.
This is the last line.
and foo and foo and foo.
=

You could even use this to tell you how many times the word appeared on
the line by adding $jit{$word} to the printed line, etc.

This could be condensed, but it works.

__
Do You Yahoo!?
Yahoo! Auctions - buy the things you want at great prices
http://auctions.yahoo.com/



Re: Repeated Words

2001-04-25 Thread Steven . Spears


I believe if you add the (g)lobal modifier and optionally(i), Paul's line
of code may work:

 $line =~ /(\b\w+\b).*\1/ogi;

This is also addressed a little differently in Programming Perl, page 149

Steven Spears
(905)-405-0955
[EMAIL PROTECTED]



   
  
Paul   
  
, 
o.com>   [EMAIL PROTECTED]
  
 cc:   
  
04/25/01 Subject: Re: Repeated Words   
  
10:31 AM   
  
Please 
  
respond to 
  
Hodges 
  
   
  
   
  




--- "Helio S. Junior" <[EMAIL PROTECTED]> wrote:
> Hello,

Hi =o)

> How do i read a simple file and look for "repeated
> words" in each line, writing out the words i have
> found and the numbers of line they were found?
>
> eg:
> File ==> Test.Dat
> sample line of text.
> this line follows another line.
> This is the last line.
>
> The program should report:
>
> Repeated Word(s): 'line'  on Line 2.

I was tempted to use something like
 $line =~ /(\b\w+\b).*\1/o;

to find repeats, but don't -- it only reads the first repeated word,
and gets more complex to fix after that.

Instead, try doing it manually:


open DAT, "Test.dat" or die $!;
my $ln = 0;
foreach my $line () {
   chomp $line;
   $ln++;
   my %hit = ();
   foreach my $word (split /\W+/o, $line) { $hit{$word}++ }
   foreach my $word (keys %hit) {
  print "Repeated Word(s): '$word'  on Line $ln.\n"
  if $hit{$word} > 1;
   }
}
close DAT;

=Test.dat
sample line of text.
this line follows another line.
This is the last line.
and foo and foo and foo.
=

You could even use this to tell you how many times the word appeared on
the line by adding $jit{$word} to the printed line, etc.

This could be condensed, but it works.

__
Do You Yahoo!?
Yahoo! Auctions - buy the things you want at great prices
http://auctions.yahoo.com/







Re: Repeated Words

2001-04-25 Thread Paul


--- [EMAIL PROTECTED] wrote:
> I believe if you add the (g)lobal modifier and optionally(i), Paul's
> line of code may work:
> 
>  $line =~ /(\b\w+\b).*\1/ogi;

I didn't actually test it -- but wouldn't it still miss interlaced
words? like:

 it might, it might

Would it see that "might" was repeated as well?
I think the first one would get thrown away by the .* in the pattern

__
Do You Yahoo!?
Yahoo! Auctions - buy the things you want at great prices
http://auctions.yahoo.com/



Re: Repeated Words

2001-04-25 Thread Sean O'Leary

At 10:48 AM 4/25/2001, you wrote:
> > I believe if you add the (g)lobal modifier and optionally(i), Paul's
> > line of code may work:
> >
> >  $line =~ /(\b\w+\b).*\1/ogi;

You don't need /o, because there are no variables in the pattern, so Perl 
will compile the regex at compile time anyway.  I guess it doesn't hurt, 
though.

I looked through some stuff for items like this, and found something in the 
Perl Cookbook that almost fit the bill.  I modified it *very* slightly so 
that it can catch punctuation between words.  Take a look at the comments 
and you'll know what to pull out if you don't want that behavior.

And before anyone goes thinking I'm a regex guru, I'm not.  I really 
literally copied 99% of this out of the Perl Cookbook.  (BTW, the three 
guys who wrote it are *really* good people to copy off of. :-) )  I know 
Perl kinda well, but of all it's features, I'm weakest on re's.

Anyway, here it be.

while (<>) {
 while ( m{
\b # Start at a word boundary
(\S+)  # \S means non-whitespace who's
   # beginning and end are alphanums
   # if you put \b's around it.
\b # until you hit another boundary
\W?# match a non-word char
(
\s+# Then some space
\1 # Then what we matched in (\S+) above
\b # with a boundary after it
)
   }xig# eXtended (Comments), case Insensitive
   # Global
   )
 {
 print "Dupe word '$1' at line $.\n";
 }
}

Thank you for your time,

Sean.