Re: Pattern matching problem - Why won't this work?

2010-04-13 Thread Shawn H Corey

Owen Chavez wrote:

Hello!

I have a pattern matching question using Perl 5.10, Windows 7.  Suppose I
have a file containing the following block of text:

Hello there TODD
I my We Us ourselves OUr I.

The file has 10 words, including 7 first-person pronouns (and 3 non-pronouns
that I have no interest in).

I've scrabbled together the following code:




#!/usr/bin/perl
use strict;
use warnings;

my @prnouns1 = qw(I we me us myself ourselves mine ours my our);

...



my $n_words = 0;
my $fst_prsn = 0;


while (my $line = <>)
{


# replace starting here


  chomp $line;
  my @strings = split /\s+/, $line;
  my @words = grep /\w+/, @strings;


# to here with:
my @words = split /\W+/, $line;


  my $n_words += scalar(@words);


$n_words += scalar( @words );


  $fst_prsn += scalar (grep {my $comp1 = $_; grep {$_ =~ /\b$comp1\b/ig}
@words} @prnouns1);


for my $pronoun ( @prnouns ){
  for my $word ( @words ){
$fst_prsn ++ if lc( $word ) eq lc( $pronoun );
  }
}


}
print "Result: Number of words: $n_words - First: $fst_prsn\n";



--
Just my 0.0002 million dollars worth,
  Shawn

Programming is as much about organization and communication
as it is about coding.

I like Perl; it's the only language where you can bless your
thingy.

Eliminate software piracy:  use only FLOSS.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Pattern matching problem - Why won't this work?

2010-04-13 Thread Peter Scott
On Mon, 12 Apr 2010 23:04:53 -0500, Owen Chavez wrote:
> Can you suggest a reference on hashes that will provide some clue as to
> how they can be used for the problem I posted?  I've looked over
> Programming Perl (3rd) and it's not entirely clear to me how to proceed
> with a hash.

Learning Perl 5th ed. by Randal, Tom, and brian is the best book for what 
you want.  If you want a video, there is my Perl Fundamentals (informit 
link below).  And if you want an online instruction course, the O'Reilly 
School of Technology (last link below).

You're just counting how many times a pronoun shows up in a list of words 
extracted from text.  The core of such code will be something like:

$is_pronoun{lc $word} and $count{lc $word}++;

-- 
Peter Scott
http://www.perlmedic.com/ http://www.perldebugged.com/
http://www.informit.com/store/product.aspx?isbn=0137001274
http://www.oreillyschool.com/courses/perl1/

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Pattern matching problem - Why won't this work?

2010-04-12 Thread Owen Chavez
Thank you for the feedback.  I do apologize for not posting a working
example; I can't post the full code and I was attempting to extract the
offending sections.

I have no particular fondness for grep.  A search of postings on perlmonks
revealed a variation of the code I employed.  I am learning perl as quickly
and thoroughly as I can, but I am cannibalizing some code for the purpose of
completing my tasks within a foreseeable time frame.

Can you suggest a reference on hashes that will provide some clue as to how
they can be used for the problem I posted?  I've looked over Programming
Perl (3rd) and it's not entirely clear to me how to proceed with a hash.

Owen

On Mon, Apr 12, 2010 at 10:11 PM, Peter Scott  wrote:

> On Mon, 12 Apr 2010 21:06:58 -0500, Owen Chavez wrote:
> > I have a pattern matching question using Perl 5.10, Windows 7.  Suppose
> > I have a file containing the following block of text:
> >
> > Hello there TODD
> > I my We Us ourselves OUr I.
> >
> > The file has 10 words, including 7 first-person pronouns (and 3
> > non-pronouns that I have no interest in).
> >
> > I've scrabbled together the following code:
> >
> > #!/usr/bin/perl
> > use strict;
> > use warnings;
> >
> > my @prnouns1 = qw(I we me us myself ourselves mine ours my our);
> >
> > ...
> >
> > while (my $line = <>)
> > {
> >   chomp $line;
> >   my @strings = split /\s+/, $line;
> >   my @words = grep /\w+/, @strings;
> >   my $n_words += scalar(@words);
> >   $fst_prsn += scalar (grep {my $comp1 = $_; grep {$_ =~ /\b$comp1\b/ig}
> > @words} @prnouns1);
> > }
> > print "Result: Number of words: $n_words - First: $fst_prsn\n";
> >
> > The result produced by this code is incorrect:
>
> The result is correct, it's the code that is incorrect :-)
>
> > Result: Number of words: 10 - First: 6
> >
> > It's not counting the second "I" although I've included the /g modifier.
> >  Can anyone tell me why?  How can I accomplish this?
>
> First, it helps if you post a complete working example.  Yours isn't;
> $n_words isn't in scope when you print it.
>
> Second, you seem to be very fond of the grep function, but I don't think
> you understand what it is for or how it works.  It is not identical to
> the Unix utility.  Your problem is that when the result of the outer
> block is a list containing more than one element, you seem to think that
> should somehow let through more than one element from the list being
> passed in.  You will not get more elements out of grep than you put in.
>
> Third, you would achieve the goals of this code much more understandably
> if you used a hash.  If you don't know what hashes are, now is the time
> to learn.
>
> --
> Peter Scott
> http://www.perlmedic.com/ http://www.perldebugged.com/
> http://www.informit.com/store/product.aspx?isbn=0137001274
> http://www.oreillyschool.com/courses/perl1/
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>


Re: Pattern matching problem - Why won't this work?

2010-04-12 Thread Peter Scott
On Mon, 12 Apr 2010 21:06:58 -0500, Owen Chavez wrote:
> I have a pattern matching question using Perl 5.10, Windows 7.  Suppose
> I have a file containing the following block of text:
> 
> Hello there TODD
> I my We Us ourselves OUr I.
> 
> The file has 10 words, including 7 first-person pronouns (and 3
> non-pronouns that I have no interest in).
> 
> I've scrabbled together the following code:
> 
> #!/usr/bin/perl
> use strict;
> use warnings;
> 
> my @prnouns1 = qw(I we me us myself ourselves mine ours my our);
> 
> ...
> 
> while (my $line = <>)
> {
>   chomp $line;
>   my @strings = split /\s+/, $line;
>   my @words = grep /\w+/, @strings;
>   my $n_words += scalar(@words);
>   $fst_prsn += scalar (grep {my $comp1 = $_; grep {$_ =~ /\b$comp1\b/ig}
> @words} @prnouns1);
> }
> print "Result: Number of words: $n_words - First: $fst_prsn\n";
> 
> The result produced by this code is incorrect:

The result is correct, it's the code that is incorrect :-)

> Result: Number of words: 10 - First: 6
> 
> It's not counting the second "I" although I've included the /g modifier.
>  Can anyone tell me why?  How can I accomplish this?

First, it helps if you post a complete working example.  Yours isn't; 
$n_words isn't in scope when you print it.

Second, you seem to be very fond of the grep function, but I don't think 
you understand what it is for or how it works.  It is not identical to 
the Unix utility.  Your problem is that when the result of the outer 
block is a list containing more than one element, you seem to think that 
should somehow let through more than one element from the list being 
passed in.  You will not get more elements out of grep than you put in.

Third, you would achieve the goals of this code much more understandably 
if you used a hash.  If you don't know what hashes are, now is the time 
to learn.

-- 
Peter Scott
http://www.perlmedic.com/ http://www.perldebugged.com/
http://www.informit.com/store/product.aspx?isbn=0137001274
http://www.oreillyschool.com/courses/perl1/

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Pattern matching problem - Why won't this work?

2010-04-12 Thread Owen Chavez
Hello!

I have a pattern matching question using Perl 5.10, Windows 7.  Suppose I
have a file containing the following block of text:

Hello there TODD
I my We Us ourselves OUr I.

The file has 10 words, including 7 first-person pronouns (and 3 non-pronouns
that I have no interest in).

I've scrabbled together the following code:




#!/usr/bin/perl
use strict;
use warnings;

my @prnouns1 = qw(I we me us myself ourselves mine ours my our);

...

while (my $line = <>)
{
  chomp $line;
  my @strings = split /\s+/, $line;
  my @words = grep /\w+/, @strings;
  my $n_words += scalar(@words);
  $fst_prsn += scalar (grep {my $comp1 = $_; grep {$_ =~ /\b$comp1\b/ig}
@words} @prnouns1);
}
print "Result: Number of words: $n_words - First: $fst_prsn\n";




The result produced by this code is incorrect:

Result: Number of words: 10 - First: 6

It's not counting the second "I" although I've included the /g modifier.
 Can anyone tell me why?  How can I accomplish this?

Owen Chavez


Re: Pattern matching problem

2008-02-29 Thread Gunnar Hjalmarsson

Anirban Adhikary wrote:


Subject: Pattern matching problem


As far as I can tell, this is not a pattern matching problem.


I have a very large file basically it is logfile generated by sql
loader. In the production environment this file can have one
million/ two million data. In this  file there are 4 particular lines which
i need to extract from this log file.

*Total logical records skipped:  0
Total logical records read:  4830
Total logical records rejected:51
Total logical records discarded: 4760
*
These four lines stayed at the bottom of the. Now if I use a filehandel to
open the file and stored it contents in an array


Why would you store its contents in an array? Typically you read a file 
line by line.



and after that I make a
search to find these 4 lines then it will take lot of times to get output.
So is there any other way where I dont need to store the file in a array and
I can directly search the file and when I find these lines I can store these
lines in some array or variables.


This is how you can read the file line by line:

open my $fh, '<', 'sql.log' or die $!;
while ( <$fh> ) {
print if substr($_, 0, 13) eq 'Total logical';
}

Since the file is large, and you know that what you are looking for is 
near the end of it, you can use the seek() function to speed up the process.


open my $fh, '<', 'sql.log' or die $!;
seek $fh, -500, 2 or die $!;
while ( <$fh> ) {
print if substr($_, 0, 13) eq 'Total logical';
}

See "perldoc -f seek".

An alternative is to make use of the module File::ReadBackwards.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Pattern matching problem

2008-02-29 Thread Anirban Adhikary
Dear List
I have a very large file basically it is logfile generated by sql
loader. In the production environment this file can have one
million/ two million data. In this  file there are 4 particular lines which
i need to extract from this log file.

*Total logical records skipped:  0
Total logical records read:  4830
Total logical records rejected:51
Total logical records discarded: 4760
*
These four lines stayed at the bottom of the. Now if I use a filehandel to
open the file and stored it contents in an array and after that I make a
search to find these 4 lines then it will take lot of times to get output.
So is there any other way where I dont need to store the file in a array and
I can directly search the file and when I find these lines I can store these
lines in some array or variables. I am pesting part of the file here.

---

SQL*Loader: Release 9.2.0.1.0 - Production on Tue Feb 5 10:58:04 2008

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

Control File:   FINAL.ctl
Data File:  FINAL.DAT
  Bad File: FINAL.bad
  Discard File: FINAL.dsc
 (Allow all discards)

Number to load: ALL
Number to skip: 0
Errors allowed: 50
Bind array: 64 rows, maximum of 256000 bytes
Continuation:none specified
Path used:  Conventional

Table DIM_DIAL_DIGIT, loaded when ISO_COUNTRY_CODE = 0X54554e(character
'TUN')
Insert option in effect for this table: APPEND

   Column Name  Position   Len  Term Encl Datatype
-- -- -  
-
DIAL_DIGIT_KEY  FIRST *   ;  O(")
CHARACTER
BU_KEY   NEXT *   ;  O(")
CHARACTER
NULL if BU_KEY = 0X4e554c4c(character 'NULL')
NOP_ID_KEY   NEXT *   ;  O(")
CHARACTER
SDCA_LOCATION_CODE   NEXT *   ;  O(")
CHARACTER
TARGET_REGION_DESC   NEXT *   ;  O(")
CHARACTER
TARGET_COUNTRY_CODE  NEXT *   ;  O(")
CHARACTER
NULL if TARGET_COUNTRY_CODE = 0X4e554c4c(character 'NULL')
TARGET_COUNTRY_DESC  NEXT *   ;  O(")
CHARACTER
LDCA_NAMENEXT *   ;  O(")
CHARACTER
SDCA_NAMENEXT *   ;  O(")
CHARACTER
LDCC_X_COORD NEXT *   ;  O(")
CHARACTER
NULL if LDCC_X_COORD = 0X4e554c4c(character 'NULL')
LDCC_Y_COORD NEXT *   ;  O(")
CHARACTER
NULL if LDCC_Y_COORD = 0X4e554c4c(character 'NULL')
SDCC_X_COORD NEXT *   ;  O(")
CHARACTER
NULL if SDCC_X_COORD = 0X4e554c4c(character 'NULL')
SDCC_Y_COORD NEXT *   ;  O(")
CHARACTER
NULL if SDCC_Y_COORD = 0X4e554c4c(character 'NULL')
POPULATION_DATE_TIME NEXT *   ;  O(") DATE MM/DD/
HH24:MI:SS
NULL if POPULATION_DATE_TIME = 0X4e554c4c(character 'NULL')
ISO_COUNTRY_CODE NEXT *   ;  O(")
CHARACTER
HOTLIST_IND  NEXT *   ;  O(")
CHARACTER
BLACKLIST_INDNEXT *   ;  O(")
CHARACTER
UPDATE_DATE_TIME NEXT *   ;  O(") DATE MM/DD/
HH24:MI:SS
NULL if UPDATE_DATE_TIME = 0X4e554c4c(character 'NULL')
EVENT_TYPE_KEY   NEXT *   ;  O(")
CHARACTER
NULL if EVENT_TYPE_KEY = 0X4e554c4c(character 'NULL')
PROVIDER_DESCRIPTION NEXT *   ;  O(")
CHARACTER
DM_IND   NEXT *   ;  O(")
CHARACTER
DIAL_DIGIT_OPERATOR_TYPE NEXT *   ;  O(")
CHARACTER
CALL_DIRECTION_KEY   NEXT *   ;  O(")
CHARACTER
NULL if CALL_DIRECTION_KEY = 0X4e554c4c(character 'NULL')
DIAL_DIGIT_DESCRIPTION   NEXT *   ;  O(") CHARACTER   Record
1: Discarded - failed all WHEN clauses.
FORCE_RI_IND NEXT *   ;  O(")
CHARACTER
TEST_CALL_INDNEXT *   ;  O(") CHARACTER

Record 1: Discarded - failed all WHEN clauses.
Record 2: Discarded - failed all WHEN clauses.
Record 3: Discarded - failed all WHEN clauses.
Record 4: Discarded - failed all WHEN clauses.
Record 5: Discarded - failed all WHEN clauses.
Record 6: Discarded - failed all WHEN clauses.
..
.

Record 482500: Discarded - failed all WHEN clauses.
Record 485001: Discarded - failed all WHEN clauses.
Record 485002: Discarded - failed all WHEN clauses.
Record 485003: Discarded - failed all WHEN clauses.
Record 485004: Discarded - failed all WHEN clauses.
Record 231: Rejected - Error on table DIM_DIAL_DIGIT.
ORA-1: unique constraint (SCOTT.SYS_C003608) violat

Re: Pattern matching problem

2005-05-10 Thread John Doe
Am Dienstag, 10. Mai 2005 11.23 schrieb Kpramod:
> Hi John,
> Try to use 'chop' to get null value
> Thanks and Regards
> Pramod

Hi Pramad,

sorry, I don't understand what you mean. Do you refer to the line

 my @new=grep {$_ and !/^\s+$/ and !/^\0+$/} @array1;

(I see that the test for \0 is ugly, but I found nothing else -  and I can't 
see a usage for chop here)

greetings 
joe


> John Doe wrote:
> >Am Dienstag, 10. Mai 2005 11.01 schrieb Tielman Koekemoer (TNE):
> >>Hi all,
> >>
> >>I have tried various regular expressions to remove null or empty
> >>values on array @array1 and create a new array @OPD01 with the values.
> >>This, however, does not work as I still get a number of empty values
> >>in the @OPD01 array after this processing. As you'll see I tried
> >>various things - check for null(\0), empty lines, lines that do not
> >>contain words etc.
> >>
> >>$counter2 = 0;
> >
> >What's that for? (never used)
> >
> >>foreach $line ( @array1 )
> >>{
> >>$line =~ s/^\s+//;
> >>$line =~ s/\s+$//;
> >>next if $line =~ /!\w/;
> >>next if $line =~ /^\s+$/;
> >>next if $line =~ /'\0'/;
> >>next if $line =~ /^$/;
> >>
> >>$OPD01[$counter]=$line;
> >>$counter++;
> >
> >Use push() to avoid holding the current array index.
> >
> >>}
> >>
> >>Any info would be appreciated.
> >
> >my @array1=(' ', 'a', '', 'b', "\0", 'c', undef, 'd', ' ', 'e');
> >my @new=grep {$_ and !/^\s+$/ and !/^\0+$/} @array1;
> >print join "\n", @new;
> >
> ># prints:
> >a
> >b
> >c
> >d
> >e
> >
> >>TIA
> >>
> >>Tielman

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Pattern matching problem

2005-05-10 Thread John Doe
Am Dienstag, 10. Mai 2005 11.46 schrieb Tielman Koekemoer (TNE):
> >> $counter2 = 0;
> >
> >What's that for? (never used)
>
> Hmm yeah sorry that was supposed to be $counter = 0;
>
> >Use push() to avoid holding the current array index.
>
> What do you mean by "holding the index"?

"remember (and incrementing) the current end index of @OPD01 in $counter"
(still can't recall the proper english for that)

joe

> >my @array1=(' ', 'a', '', 'b', "\0", 'c', undef, 'd', ' ', 'e'); my
>
> @new=grep {$_ and !/^\s+$/ and >!/^\0+$/} @array1; print join "\n",
> @new;
>
> Yes, that worked. Thanks very much.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: Pattern matching problem

2005-05-10 Thread Tielman Koekemoer \(TNE\)
Ah I see: use push() to add scalars/lists to arrays. 

Thanks everyone for the help.


> Use push() to avoid holding the current array index.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: Pattern matching problem

2005-05-10 Thread Tielman Koekemoer \(TNE\)
 
>> $counter2 = 0;

>What's that for? (never used)

Hmm yeah sorry that was supposed to be $counter = 0;

>Use push() to avoid holding the current array index.

What do you mean by "holding the index"?

>my @array1=(' ', 'a', '', 'b', "\0", 'c', undef, 'd', ' ', 'e'); my
@new=grep {$_ and !/^\s+$/ and >!/^\0+$/} @array1; print join "\n",
@new;

Yes, that worked. Thanks very much.



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Pattern matching problem

2005-05-10 Thread Kpramod
Hi John,
Try to use 'chop' to get null value
Thanks and Regards
Pramod
John Doe wrote:
Am Dienstag, 10. Mai 2005 11.01 schrieb Tielman Koekemoer (TNE):
 

Hi all,
I have tried various regular expressions to remove null or empty
values on array @array1 and create a new array @OPD01 with the values.
This, however, does not work as I still get a number of empty values
in the @OPD01 array after this processing. As you'll see I tried
various things - check for null(\0), empty lines, lines that do not
contain words etc.
$counter2 = 0;
   

What's that for? (never used)
 

   foreach $line ( @array1 )
   {
   $line =~ s/^\s+//;
   $line =~ s/\s+$//;
   next if $line =~ /!\w/;
   next if $line =~ /^\s+$/;
   next if $line =~ /'\0'/;
   next if $line =~ /^$/;
   $OPD01[$counter]=$line;
   $counter++;
   

Use push() to avoid holding the current array index.
 

   }
Any info would be appreciated.
   

my @array1=(' ', 'a', '', 'b', "\0", 'c', undef, 'd', ' ', 'e');
my @new=grep {$_ and !/^\s+$/ and !/^\0+$/} @array1;
print join "\n", @new;
# prints:
a
b
c
d
e

 

TIA
Tielman
   

 

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



Re: Pattern matching problem

2005-05-10 Thread Ing. Branislav Gerzo
Tielman Koekemoer (TNE) [TK], on Tuesday, May 10, 2005 at 11:01
(+0200) contributed this to our collective wisdom:

TK> I have tried various regular expressions to remove null or empty
TK> values on array @array1 and create a new array @OPD01 with the values.
TK> This, however, does not work as I still get a number of empty values
TK> in the @OPD01 array after this processing. As you'll see I tried
TK> various things - check for null(\0), empty lines, lines that do not
TK> contain words etc.

what about grep?
perldoc -f grep

my @OPD01 = grep /\S/, @array;

--

How do you protect mail on web? I use http://www.2pu.net

[Bullshit Detector.  When alarm sounds,  please re-engage your brain.]



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Pattern matching problem

2005-05-10 Thread John Doe
Am Dienstag, 10. Mai 2005 11.01 schrieb Tielman Koekemoer (TNE):
> Hi all,
>
> I have tried various regular expressions to remove null or empty
> values on array @array1 and create a new array @OPD01 with the values.
> This, however, does not work as I still get a number of empty values
> in the @OPD01 array after this processing. As you'll see I tried
> various things - check for null(\0), empty lines, lines that do not
> contain words etc.
>
> $counter2 = 0;

What's that for? (never used)

>
> foreach $line ( @array1 )
> {
> $line =~ s/^\s+//;
> $line =~ s/\s+$//;
> next if $line =~ /!\w/;
> next if $line =~ /^\s+$/;
> next if $line =~ /'\0'/;
> next if $line =~ /^$/;
>
> $OPD01[$counter]=$line;
> $counter++;

Use push() to avoid holding the current array index.

> }
>
> Any info would be appreciated.

my @array1=(' ', 'a', '', 'b', "\0", 'c', undef, 'd', ' ', 'e');
my @new=grep {$_ and !/^\s+$/ and !/^\0+$/} @array1;
print join "\n", @new;

# prints:
a
b
c
d
e




>
> TIA
>
> Tielman

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Pattern matching problem

2005-05-10 Thread Tielman Koekemoer \(TNE\)
Hi all,

I have tried various regular expressions to remove null or empty
values on array @array1 and create a new array @OPD01 with the values.
This, however, does not work as I still get a number of empty values
in the @OPD01 array after this processing. As you'll see I tried
various things - check for null(\0), empty lines, lines that do not
contain words etc.

$counter2 = 0;

foreach $line ( @array1 )
{
$line =~ s/^\s+//;
$line =~ s/\s+$//;
next if $line =~ /!\w/;
next if $line =~ /^\s+$/;
next if $line =~ /'\0'/;
next if $line =~ /^$/;

$OPD01[$counter]=$line;
$counter++;
}

Any info would be appreciated.

TIA

Tielman

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Pattern matching problem

2004-02-26 Thread wolf blaum
On Thursday 26 February 2004 12:28, Henry Todd generously enriched virtual 
reality by making up this one:

> On 2004-02-26 00:43:21 +, [EMAIL PROTECTED] (Wolf Blaum) said:
> > As I understand Biology, there is 4 nucleotid acids which gives 4**2
> > combinaions for dupplets. So you need 8 vars to count the occourence of
> > all douplets. Worse for triplets. (24)
> > As I understand genetics, triplets are what matters, since the rma
> > transcriptase reads triplets as code of amino acids. You might give my
> > updates un my biol. knowledge:-)
>
> Wolf -
>
> It's been a while since my A-Level biology days, but I believe you're
> correct. However, this particular coursework was to create two programs
> for a different purpose than I think you're imagining:

Hi, 

as  you can tell form my mail it has been a while since my basic math classes, 
too: 4**2 =8? 4**3=24?  Uhuh...
However, the real bug was 
for (my $i=0;$i < length($sequence) - $wordsize;$i++){
which should be 
for (my $i=0;$i <= length($sequence) - $wordsize;$i++){
beause it misses the last douplet/triplet/... otherwise.

> transition.pl: returns tables of transition probabilities for plus and
> minus models (exon and non-exon regions) as well as beta values
> (log-odds ratios) to compare the two models.
>
> The transition probability for AT for example (the probability that
> adenine will be followed by thymine) is calculated thus:
>
> tp(AT) = |AT| / |A_|
>
> The total number of occurrences of "AT" divided by the total number of
> "A" followed by anything.
>
> The program can also write the transition probabilities to a file to be
> used as input for the other program...

ok - but once you end up with a hash containing all the douplets as there keys 
and frequency as values that should be doable as long as you know the members 
of your alphabet. 
I dont know if there is such a thing as transition probabilitis for codons (ie 
triplets) as well - if there is, then this should manifest as transition 
probilities for amino accids. In that case, creating the hash of wmers is 
done by just feeding the script another sequence. The only thing to change 
would be add knowledge about the AA alphabet to your script.

> simulation.pl: which asks the user to specify the length of the
> sequence they want, then generates it according to the model file used
> as input (by simulating a Markov chain). So if you supply a file
> containing the transition probabilities of a typical exon (coding)
> region, the simulation will use them to generate a typical exon
> sequence.

This gets really of topic:
Just interested: How do you choose which Letter to start with since there is 
no tp for nothing folowed by whatever?

Sounds like a fun problem:)

G'day, Wolf




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Pattern matching problem

2004-02-26 Thread Henry Todd
On 2004-02-26 00:43:21 +, [EMAIL PROTECTED] (Wolf Blaum) said:

As I understand Biology, there is 4 nucleotid acids which gives 4**2 
combinaions for dupplets. So you need 8 vars to count the occourence of 
all douplets. Worse for triplets. (24)
As I understand genetics, triplets are what matters, since the rma 
transcriptase reads triplets as code of amino acids. You might give my 
updates un my biol. knowledge:-)
Wolf -

It's been a while since my A-Level biology days, but I believe you're 
correct. However, this particular coursework was to create two programs 
for a different purpose than I think you're imagining:

transition.pl: returns tables of transition probabilities for plus and 
minus models (exon and non-exon regions) as well as beta values 
(log-odds ratios) to compare the two models.

The transition probability for AT for example (the probability that 
adenine will be followed by thymine) is calculated thus:

tp(AT) = |AT| / |A_|

The total number of occurrences of "AT" divided by the total number of 
"A" followed by anything.

The program can also write the transition probabilities to a file to be 
used as input for the other program...

simulation.pl: which asks the user to specify the length of the 
sequence they want, then generates it according to the model file used 
as input (by simulating a Markov chain). So if you supply a file 
containing the transition probabilities of a typical exon (coding) 
region, the simulation will use them to generate a typical exon 
sequence.

Thanks very much to everyone who's offered further advice on this 
problem, I know now that my method of counting the dinucleotides in the 
input sequence is a little brain-dead. However, it works, and I've 
learnt from it. I'm looking forward to my next foray into the world of 
Perl.

Regards,

Henry.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



Re: Pattern matching problem

2004-02-25 Thread R. Joseph Newton
Kenton Brede wrote:

> On Wed, Feb 25, 2004 at 05:52:19PM -, Rob Dixon ([EMAIL PROTECTED]) wrote:
> > Kenton Brede wrote:
> > >
> > > > I'm having trouble counting the number of specific substrings within a
> > > > string. I'm working on a bioinformatics coursework at the moment, so my
> > > > string looks like this:
> > >
> > > If you don't get an answer to your question this is probably why -
> > >
> > > http://learn.perl.org/beginners-faq#2.2%20%20what%20is%20this%20list%20_not_%20for
> >
> > Hi Kent.
> >
> > Which of that list did you think was relevant?
>
> "Homework" am I wrong?  He said he was working on "bioinformatics
> coursework."  If I'm wrong I apologize for opening my "mouth."
> Kent

I see nothing wrong with openly seeking input on any question.  That is part of the
process of active learning.  What I find objectionable is people looking for completed
assignments, or who just want to plug their data into a template, without trying to
understand for themselves why the code works.

Actually, I think the OP had a pretty good question.  I am not sure if the standard 
regex
would work for what he is trying to do.

We did have an extensive thread on a very similar problem in the last couple weeks, I
believe.  My guess here, since the desired string length is constant, that a simple
progressing forward through the string testing substrings of two for equality would do 
the
job in a straightforward manner.

Most of the veterans on this list are pretty skillful at providing help in ways that 
still
require active participation on the part of the person posting.  There is an art to it.

Joseph



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Pattern matching problem

2004-02-25 Thread wolf blaum
On Wednesday 25 February 2004 17:35, Henry Todd generously enriched virtual 
reality by making up this one:

Hi, 

> I'm having trouble counting the number of specific substrings within a
> string. I'm working on a bioinformatics coursework at the moment, so my
> string looks like this:
>
> $sequence = "caggaactttcggaagaccatgta";
>
> I want to count the number of occurrences of each pair of letters, for
> example:
>
> Number of occurrences of "aa"
> Number of occurrences of "gt"
> Number of occurrences of "cc"
>
> This is how I'm counting the number of "cc" pairs at the moment ($cc is
> my counter variable):
>
> $cc++ while $sequence =~ /cc/gi;
>

As I understand Biology, there is 4 nucleotid acids which gives 4**2 
combinaions for dupplets. So you need 8 vars to count the occourence of all 
douplets. Worse for triplets. (24)
As I understand genetics, triplets are what matters, since the rma 
transcriptase reads triplets as code of amino acids. 
You might give my updates un my biol. knowledge:-)

To make your code reusable in upcomming classworks I suggest:

---snip---

#! /usr/bin/perl

use strict;
use warnings;


my %wmers;
my $sequence = "caggaactttcggaagaccatgta";
my $wordsize = 2;

for (my $i=0;$i < length($sequence) - $wordsize;$i++){
  $wmers{substr($sequence,$i,$wordsize)}++;
}

foreach (keys %wmers) {
 print "$_ => $wmers{$_}\n";
} 

---snap---

prints on my box:

---snip---

#~> ./gataca.pl
at => 1
ct => 2
ag => 2
tt => 1
cc => 4
aa => 2
gt => 1
ga => 3
tg => 1
ca => 2
tc => 2
gg => 2
cg => 1
ac => 2

---snap---

The Idea is simple: imitate the rma transcriptase (I know you are talking 
about dna, but does that matter?) by sliding a $wordsize window over the 
sequenze.
For each window content inc the value of the corosponding hash field, create 
if necessary.

I bet, there is  a smarter solution using pos and regexes and a character 
class [gatc]{ $wordsize} - that would even make the thing usable for proteins 
by changing the character class to the protein alphabet

But im getting OT her - maybe I should have done something else for a 
living:-)

Enjoy (and reproduce), Wolf


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: Pattern matching problem

2004-02-25 Thread David le Blanc
> From: Bakken, Luke [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, 26 February 2004 4:59 AM
> To: Henry Todd; [EMAIL PROTECTED]
> Subject: RE: Pattern matching problem
> 
> > I'm having trouble counting the number of specific substrings 
> > within a 
> > string. I'm working on a bioinformatics coursework at the 
> > moment, so my 
> > string looks like this:
> > 
> > $sequence = "caggaactttcggaagaccatgta";
> > 
> > I want to count the number of occurrences of each pair of 
> > letters, for example:
> > 
> > Number of occurrences of "aa"
> > Number of occurrences of "gt"
> > Number of occurrences of "cc"
> > 
> > This is how I'm counting the number of "cc" pairs at the 
> > moment ($cc is 
> > my counter variable):
> > 
> > $cc++ while $sequence =~ /cc/gi;

While I agree that the zero-width look-ahead solves the problem for
the two character case of a single pattern.  What about the three
character or more case?  what about a search string fed in from the
command line?  What about handling a large number of search strings
possibly not known at compile time?

The following code replaces the 'one line regex' with some code which
breaks (compiles?) the search-space into a parse tree, and performs the
equivalent of 'zero-width look-ahead', but allows you to count a [very]
large number of [variable length] search items in parallel.  A number of
possible optimisations have been left out for readability and brevity.  
Please avoid searching for equivalent substrings (ie, If you are
searching
for 'thump', searching for 'hump' as well is pointless, unless you need
to count the number of times 'hump' appears without a 't' in front :)

If you want to know what '$ptree' looks like, just 'use Data::Dumper;'
at
the top, and 'print Dumper($ptree).$/;'  at the bottom.

--- snip ---

#!/usr/bin/perl

my $cstr = 'caggaactttcggaagaccatgta';
my @set = qw(cc ca gg aa ttc);

# Convert '@set' into a parse tree
my $ptree = {};
for( @set ) {
my $r = \$ptree;
$r = \ $$r->{$_} for split //;
push @rset, \$$r->{count};
}

# Parse string
my @tok = split//,$cstr;
for( 0..$#tok )
{
my $r = \$ptree;
my $n=$_;
while( exists $$r->{$tok[$n]} ) {
$r=\$$r->{$tok[$n++]}
}
$$r->{count}++ if exists $$r->{count}
}

print "Matches Found:".$/;
for(0..$#set ){
printf "%10s %d$/", $set[$_], ${$rset[$_]};
}

% ./rs.pl
Matches Found:
cc 4
ca 2
gg 2
aa 2
   ttc 1

--- snip ---


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




Re: Pattern matching problem

2004-02-25 Thread Rob Dixon
Kenton Brede wrote:
>
> Well it seems there is confusion on my part as to which part of the FAQ
> to follow.  I'm sure there are tons of homework questions done for people
> who disguise them.  That is one reason I've always felt the "no homework
> rule" is superfluous.  Personally I have no problem with homework
> questions if people just give pointers and don't actually complete the
> homework assignment for the person.  It appears I'm the one adding the
> list noise.

It's the old, "Give a man some water" versus "Build a man a well" thing.

I have always hoped to encourage people to reach beyond what I can do.

I doubt that any homework tasks are solved by the list, and if they were
the thought processes that are posted will help many others. It takes a
very clever mind to accommodate a third-party solution without
understanding either the problem or the answer.

Rob



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Pattern matching problem

2004-02-25 Thread Rob Dixon
Kenton Brede wrote:
>
> OK my mistake.  I've been on newsgroups/lists where the "no homework rule"
> is enforced and just assumed the FAQ was literal, except for the
> "monkey" parts of course.
>
> I just didn't want the OP to be hanging waiting for an answer when non
> would be forthcoming.

Hmm. 'Forthcoming' is a good Anglo-Saxon word, so I'm on your side :)

Nearly all questions on this list get answered. Sometimes there are
rebukes, and sometimes they're undeserved. But I hope and believe
that all genuine questions are addressed and resolved.

Tell me if you see otherwise.

Rob




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Pattern matching problem

2004-02-25 Thread Henry Todd
Hi all -

Many thanks to those who shared their knowledge. I had a feeling that 
there would be an elegant solution to my problem, but I was having no 
luck figuring it out.

For reference, where before my code was:

$Pcc++ while $sequence =~ /cc/gi;

..it is now:

$Pcc++ while $sequence =~ /c(?=c)/gi;

I now know what a zero-width positive look-ahead assertion is! For 
anyone who's also struggling with this, read this page (the "Extended 
Patterns" section) and compare the different types of assertion:

http://www.perldoc.com/perl5.8.0/pod/perlre.html

My program is now counting *all* occurrences of "cc", including those 
that overlap.

Looking back at the "/cc/" pattern, it seems obvious that it won't 
work, but only now that I understand how the matches are found. I've 
read through a number of 'Introductions to regular expressions in Perl' 
type documents for this coursework, and I found no indication that my 
initial pattern wouldn't work as I imagined. I'm not sure if that was 
my misunderstanding of the material, or a limitation of the 
documentation (in not pointing out the limitation of the "/cc/" 
pattern). It's probably the former.

Anyway, I'm rambling again. Thanks for the help everyone.

Regards,

Henry.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



Re: Pattern matching problem

2004-02-25 Thread WC -Sx- Jones
Kenton Brede wrote:

I just didn't want the OP to be hanging waiting for an answer when non
would be forthcoming.
Not a mistake per se -- however Perl people (read POD) will always want 
to show off -- so, if it is Perl, it is likely answered.

:)
-Sx-
(let's not mention cpl.mod)
__
We are the CLPM... Lower your standards and surrender your code...
We will add your biological and technological distinctiveness to
our own... Your thoughts will adapt to service us...
...Resistance is futile...
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



Re: Pattern matching problem

2004-02-25 Thread Kenton Brede
On Wed, Feb 25, 2004 at 06:12:55PM +, Henry Todd ([EMAIL PROTECTED]) wrote:
> On 2004-02-25 17:42:46 +, [EMAIL PROTECTED] (Kenton Brede) said:
> 
> >If you don't get an answer to your question this is probably why -
> >
> >http://learn.perl.org/beginners-faq#2.2%20%20what%20is%20this%20list%20_not_%20for
> 
> 
> Kent -
> 
> Thanks for the pointer. I should have read the list rules before 
> posting, I apologise for adding noise.
> 
> I'm not asking for anyone to do my coursework for me, and I'm sorry if 
> my question made it seem that way. All I need is a pointer to some 
> relevant documentation and I'll be happy to work it all out myself.
> 
> I could write this program in C++ or Java easily enough, but I wanted 
> to use this coursework as an excuse to introduce myself to Perl (I 
> understand Perl is better suited for text analysis and manipulation 
> tasks such as this). This pattern match is the only bit of my program 
> that isn't working right yet. Oh well, if I can't figure it out myself, 
> I can always do all the parsing manually like I would do in C++/Java -- 
> I was hoping there'd be a nicer way of doing it is all.
> 
> As an aside, I wonder if my question would be answered if I didn't 
> mention that this was for my coursework? This is a beginners group 
> after all -- I wonder how many questions are answered here that *are* 
> for courseworks, but not declared as such.

Well it seems there is confusion on my part as to which part of the FAQ
to follow.  I'm sure there are tons of homework questions done for people
who disguise them.  That is one reason I've always felt the "no homework
rule" is superfluous.  Personally I have no problem with homework
questions if people just give pointers and don't actually complete the
homework assignment for the person.  It appears I'm the one adding the
list noise.  
Kent   

-- 
"Efficiency is intelligent laziness."
  -David Dunham

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Pattern matching problem

2004-02-25 Thread Kenton Brede
On Wed, Feb 25, 2004 at 06:30:55PM -, Rob Dixon ([EMAIL PROTECTED]) wrote:
> Kenton Brede wrote:
> >
> > On Wed, Feb 25, 2004 at 05:52:19PM -, Rob Dixon ([EMAIL PROTECTED]) wrote:
> > > Kenton Brede wrote:
> > > >
> > > > > I'm having trouble counting the number of specific substrings within a
> > > > > string. I'm working on a bioinformatics coursework at the moment, so my
> > > > > string looks like this:
> > > >
> > > > If you don't get an answer to your question this is probably why -
> > > >
> > > > http://learn.perl.org/beginners-faq#2.2%20%20what%20is%20this%20list%20_not_%20for
> > >
> > > Hi Kent.
> > >
> > > Which of that list did you think was relevant?
> >
> > "Homework" am I wrong?  He said he was working on "bioinformatics
> > coursework."  If I'm wrong I apologize for opening my "mouth."
> 
> Thx Kent.
> 
> There's no by-line on this site, so I don't know who wrote it. This is
> what it says:
> 
>   2.2 - What is this list _not_ for?
> 
>   - SPAM
>   - Homework
>   - Solicitation
>   - Things that aren't Perl related
>   - Monkeys
>   - Monkeys solicitating homework on non-Perl related SPAM.
> 
> This is gratuitous. Apart from being a redundant structure
> which includes both 'all monkeys' and 'some monkeys', I think
> this is supposed to be humorous. Those outside the US will
> be asking, "What's a monkey?", and those inside it will become
> one.

OK my mistake.  I've been on newsgroups/lists where the "no homework rule" 
is enforced and just assumed the FAQ was literal, except for the
"monkey" parts of course.

I just didn't want the OP to be hanging waiting for an answer when non
would be forthcoming.
Kent 

-- 
"Efficiency is intelligent laziness."
  -David Dunham

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Pattern matching problem

2004-02-25 Thread Rob Dixon
Henry Todd wrote:
>
> On 2004-02-25 17:42:46 +, [EMAIL PROTECTED] (Kenton Brede) said:
>
> > If you don't get an answer to your question this is probably why -
> >
> > http://learn.perl.org/beginners-faq#2.2%20%20what%20is%20this%20list%20_not_%20for
>
> Thanks for the pointer. I should have read the list rules before
> posting, I apologise for adding noise.

I don't believe you have anything to apologise for.

"Blood Makes Noise", Suzanne Vega

Rob



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Pattern matching problem

2004-02-25 Thread John W. Krahn
Henry Todd wrote:
> 
> I'm having trouble counting the number of specific substrings within a
> string. I'm working on a bioinformatics coursework at the moment, so my
> string looks like this:
> 
> $sequence = "caggaactttcggaagaccatgta";
> 
> I want to count the number of occurrences of each pair of letters, for example:
> 
> Number of occurrences of "aa"
> Number of occurrences of "gt"
> Number of occurrences of "cc"
> 
> This is how I'm counting the number of "cc" pairs at the moment ($cc is
> my counter variable):
> 
> $cc++ while $sequence =~ /cc/gi;
> 
> But this only matches the literal string "cc", so if, as it scans
> $sequence, it finds "" it's only counting it once instead of three
> times.
> 
> What pattern do I need to be looking for in the $sequence if I want to
> count *all* occurences of "cc" -- even if they overlap?

Use a zero-width positive look-ahead assertion for the second repeated
character.

$ perl -le'
my $string = q//;
my $count = () = $string =~ /c(?=c)/g;
print $count;
'
3


John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Pattern matching problem

2004-02-25 Thread david
Henry Todd wrote:

[snip]

> 
> This is how I'm counting the number of "cc" pairs at the moment ($cc is
> my counter variable):
> 
> $cc++ while $sequence =~ /cc/gi;
> 
> But this only matches the literal string "cc", so if, as it scans
> $sequence, it finds "" it's only counting it once instead of three
> times.
>

are you sure it counts it once instead of two:

[panda]# perl -le '$cc++ while('' =~ /cc/g); print $cc'
2
[panda]#

> 
> What pattern do I need to be looking for in the $sequence if I want to
> count *all* occurences of "cc" -- even if they overlap?
> 

all you need is a little look ahead:

#!/usr/bin/perl -w
use strict;

my $cc = 0;

while('cbbc' =~ /c(?=c)/g){
$cc++;
}

print $cc,"\n";

__END__

prints:

3

which is what you want right?

david
-- 
sub'_{print"@_ ";* \ = * __ ,\ & \}
sub'__{print"@_ ";* \ = * ___ ,\ & \}
sub'___{print"@_ ";* \ = *  ,\ & \}
sub'{print"@_,\n"}&{_+Just}(another)->(Perl)->(Hacker)

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Pattern matching problem

2004-02-25 Thread Rob Dixon
Kenton Brede wrote:
>
> On Wed, Feb 25, 2004 at 05:52:19PM -, Rob Dixon ([EMAIL PROTECTED]) wrote:
> > Kenton Brede wrote:
> > >
> > > > I'm having trouble counting the number of specific substrings within a
> > > > string. I'm working on a bioinformatics coursework at the moment, so my
> > > > string looks like this:
> > >
> > > If you don't get an answer to your question this is probably why -
> > >
> > > http://learn.perl.org/beginners-faq#2.2%20%20what%20is%20this%20list%20_not_%20for
> >
> > Hi Kent.
> >
> > Which of that list did you think was relevant?
>
> "Homework" am I wrong?  He said he was working on "bioinformatics
> coursework."  If I'm wrong I apologize for opening my "mouth."

Thx Kent.

There's no by-line on this site, so I don't know who wrote it. This is
what it says:

  2.2 - What is this list _not_ for?

  - SPAM
  - Homework
  - Solicitation
  - Things that aren't Perl related
  - Monkeys
  - Monkeys solicitating homework on non-Perl related SPAM.

This is gratuitous. Apart from being a redundant structure
which includes both 'all monkeys' and 'some monkeys', I think
this is supposed to be humorous. Those outside the US will
be asking, "What's a monkey?", and those inside it will become
one.

If perl.beginners shouldn't help people who have chosen to try
to learn Perl, and are funded by the state, then, hell, I hang
up my hat.

Casey: what's your angle?

Rob



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Pattern matching problem

2004-02-25 Thread Henry Todd
On 2004-02-25 17:42:46 +, [EMAIL PROTECTED] (Kenton Brede) said:

If you don't get an answer to your question this is probably why -

http://learn.perl.org/beginners-faq#2.2%20%20what%20is%20this%20list%20_not_%20for
Kent

Kent 


Kent
Kent -

Thanks for the pointer. I should have read the list rules before 
posting, I apologise for adding noise.

I'm not asking for anyone to do my coursework for me, and I'm sorry if 
my question made it seem that way. All I need is a pointer to some 
relevant documentation and I'll be happy to work it all out myself.

I could write this program in C++ or Java easily enough, but I wanted 
to use this coursework as an excuse to introduce myself to Perl (I 
understand Perl is better suited for text analysis and manipulation 
tasks such as this). This pattern match is the only bit of my program 
that isn't working right yet. Oh well, if I can't figure it out myself, 
I can always do all the parsing manually like I would do in C++/Java -- 
I was hoping there'd be a nicer way of doing it is all.

As an aside, I wonder if my question would be answered if I didn't 
mention that this was for my coursework? This is a beginners group 
after all -- I wonder how many questions are answered here that *are* 
for courseworks, but not declared as such.

Anyway, I'm rambling. Thanks again Kent.

Regards,

Henry.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



Re: Pattern matching problem

2004-02-25 Thread Kenton Brede
On Wed, Feb 25, 2004 at 05:52:19PM -, Rob Dixon ([EMAIL PROTECTED]) wrote:
> Kenton Brede wrote:
> >
> > > I'm having trouble counting the number of specific substrings within a
> > > string. I'm working on a bioinformatics coursework at the moment, so my
> > > string looks like this:
> >
> > If you don't get an answer to your question this is probably why -
> >
> > http://learn.perl.org/beginners-faq#2.2%20%20what%20is%20this%20list%20_not_%20for
> 
> Hi Kent.
> 
> Which of that list did you think was relevant?

"Homework" am I wrong?  He said he was working on "bioinformatics
coursework."  If I'm wrong I apologize for opening my "mouth."  
Kent  

-- 
"Efficiency is intelligent laziness."
  -David Dunham

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Pattern matching problem

2004-02-25 Thread Rob Dixon
Kenton Brede wrote:
>
> > I'm having trouble counting the number of specific substrings within a
> > string. I'm working on a bioinformatics coursework at the moment, so my
> > string looks like this:
>
> If you don't get an answer to your question this is probably why -
>
> http://learn.perl.org/beginners-faq#2.2%20%20what%20is%20this%20list%20_not_%20for

Hi Kent.

Which of that list did you think was relevant?

Rob



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: Pattern matching problem

2004-02-25 Thread Bakken, Luke
> I'm having trouble counting the number of specific substrings 
> within a 
> string. I'm working on a bioinformatics coursework at the 
> moment, so my 
> string looks like this:
> 
> $sequence = "caggaactttcggaagaccatgta";
> 
> I want to count the number of occurrences of each pair of 
> letters, for example:
> 
> Number of occurrences of "aa"
> Number of occurrences of "gt"
> Number of occurrences of "cc"
> 
> This is how I'm counting the number of "cc" pairs at the 
> moment ($cc is 
> my counter variable):
> 
> $cc++ while $sequence =~ /cc/gi;
> 
> But this only matches the literal string "cc", so if, as it scans 
> $sequence, it finds "" it's only counting it once instead 
> of three 
> times.
> 
> What pattern do I need to be looking for in the $sequence if 
> I want to 
> count *all* occurences of "cc" -- even if they overlap?

use strict;
my $sequence = "caggaactttcggaagaccatgta";
my $cc;
while ( $sequence =~ /cc/gi ) {
print "$` $& $'\n";
++$cc;
pos $sequence = pos($sequence) - 1;
}
print $cc;

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Pattern matching problem

2004-02-25 Thread Kenton Brede
On Wed, Feb 25, 2004 at 04:35:57PM +, Henry Todd ([EMAIL PROTECTED]) wrote:
> I'm having trouble counting the number of specific substrings within a 
> string. I'm working on a bioinformatics coursework at the moment, so my 
> string looks like this:

If you don't get an answer to your question this is probably why -

http://learn.perl.org/beginners-faq#2.2%20%20what%20is%20this%20list%20_not_%20for

Kent

-- 
"Efficiency is intelligent laziness."
  -David Dunham

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Pattern matching problem

2004-02-25 Thread Henry Todd
I'm having trouble counting the number of specific substrings within a 
string. I'm working on a bioinformatics coursework at the moment, so my 
string looks like this:

$sequence = "caggaactttcggaagaccatgta";

I want to count the number of occurrences of each pair of letters, for example:

Number of occurrences of "aa"
Number of occurrences of "gt"
Number of occurrences of "cc"
This is how I'm counting the number of "cc" pairs at the moment ($cc is 
my counter variable):

$cc++ while $sequence =~ /cc/gi;

But this only matches the literal string "cc", so if, as it scans 
$sequence, it finds "" it's only counting it once instead of three 
times.

What pattern do I need to be looking for in the $sequence if I want to 
count *all* occurences of "cc" -- even if they overlap?

I apologise if this is an already documented problem, I've tried a 
number of Google Groups searches, as well as searches on 
learn.perl.org, but without finding an answer.

Many thanks for any help offered.

Henry.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



Re: file path pattern matching problem.

2003-12-10 Thread B. Fongo
The best way to do it; is  using the standard module File::Basename.
For instance
use File::Basename;
# This should return "somefile".

$file_name = basename (c:\test\abc\what\somefile.txt);

# This should also return "c:\test\abc\what\"

$dir_name = dir (c:\test\abc\what\somefile.txt);

# fileparse should returns file, directory and suffix.
($filename, $dir, $suffix) = fileparse (c:\test\abc\what\somefile.txt);
Check perldocs for details.

HTH
Babs
Ben Crane wrote:

Hi all,

I'm trying to split apart a filepath...e.g: 
c:\test\abc\what\somefile.txt
The length of the filepath will never be constant...

e.g:
foreach $line (@Path_Filename)
  {
chomp($line);
(@Path_Breakdown) = split(/(\w+\W)(\w+\W)/, $line);
  }
but my biggest problem is how to match a word
character \w then match everything until the last
\...that will comprise of the file path and the final
\ onwards will be the filename incl. or excl. the file
extension...
I've tried to get the pattern matching to include
everything including the \ but it doesn't seem to
work. The closest I've gotten is:
c:\test\abc\what\somefile.txt
c:
\test\abc\what\
somefile.
txt
Any ideas? Is there a pattern character that I'm
missing here that allows you to match a certain
character and then stop if it's the last one of it's
type?
Ben

__
Do you Yahoo!?
New Yahoo! Photos - easier uploading and sharing.
http://photos.yahoo.com/
 



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



RE: file path pattern matching problem.

2003-12-10 Thread Tom Kinzer
Yes! And use Basename too.

these will also give you the advantage of making your programs more
portable!

-Tom Kinzer


-Original Message-
From: John W. Krahn [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 10, 2003 11:37 AM
To: [EMAIL PROTECTED]
Subject: Re: file path pattern matching problem.


Ben Crane wrote:
>
> Hi all,

Hello,

> I'm trying to split apart a filepath...e.g:
> c:\test\abc\what\somefile.txt
> The length of the filepath will never be constant...


$ perl -le'
use File::Spec;

my $path = q[c:\test\abc\what\somefile.txt];

my ( $vol, $dir, $file ) = File::Spec->splitpath( $path );
print qq[ "$vol"  "$dir"  "$file" ];

my @dirs = File::Spec->splitdir( $dir );
print map qq[ "$_" ], @dirs;

'
 "c:"  "\test\abc\what\"  "somefile.txt"
 ""  "test"  "abc"  "what"  ""



John
--
use Perl;
program
fulfillment

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




Re: file path pattern matching problem.

2003-12-10 Thread John W. Krahn
Ben Crane wrote:
> 
> Hi all,

Hello,

> I'm trying to split apart a filepath...e.g:
> c:\test\abc\what\somefile.txt
> The length of the filepath will never be constant...


$ perl -le'
use File::Spec;

my $path = q[c:\test\abc\what\somefile.txt];

my ( $vol, $dir, $file ) = File::Spec->splitpath( $path );
print qq[ "$vol"  "$dir"  "$file" ];

my @dirs = File::Spec->splitdir( $dir );
print map qq[ "$_" ], @dirs; 
 
'
 "c:"  "\test\abc\what\"  "somefile.txt" 
 ""  "test"  "abc"  "what"  "" 



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: file path pattern matching problem.

2003-12-10 Thread Balint, Jess
Ben -

You can use the File::Basename module for this:

Your program would be akin to:

foreach $line (@Path_Filename)
{
chomp($line);
$filename = basename($line); # gives you the filename with the
extension
$location = dirname($line);  # gives you the location with no
trailing /
}

Here are some examples:

~% perl -MFile::Basename -e'print basename($ARGV[0])' /etc/hosts.equiv
hosts.equiv
~% perl -MFile::Basename -e'print dirname($ARGV[0])' /etc/hosts.equiv
/etc

~ Jess

> -Original Message-
> From: Ben Crane [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, December 10, 2003 9:22 AM
> To: [EMAIL PROTECTED]
> Subject: file path pattern matching problem.
> 
> 
> Hi all,
> 
> I'm trying to split apart a filepath...e.g: 
> c:\test\abc\what\somefile.txt
> The length of the filepath will never be constant...
> 
> e.g:
> foreach $line (@Path_Filename)
>{
>   chomp($line);
> (@Path_Breakdown) = split(/(\w+\W)(\w+\W)/, $line);
>}
> 
> but my biggest problem is how to match a word
> character \w then match everything until the last
> \...that will comprise of the file path and the final
> \ onwards will be the filename incl. or excl. the file
> extension...
> 
> I've tried to get the pattern matching to include
> everything including the \ but it doesn't seem to
> work. The closest I've gotten is:
> 
> c:\test\abc\what\somefile.txt
> c:
> \test\abc\what\
> somefile.
> txt
> 
> Any ideas? Is there a pattern character that I'm
> missing here that allows you to match a certain
> character and then stop if it's the last one of it's
> type?
> 
> Ben
> 
> 
> __
> Do you Yahoo!?
> New Yahoo! Photos - easier uploading and sharing.
> http://photos.yahoo.com/
> 
> -- 
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
> 
> 
> 

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




file path pattern matching problem.

2003-12-10 Thread Ben Crane
Hi all,

I'm trying to split apart a filepath...e.g: 
c:\test\abc\what\somefile.txt
The length of the filepath will never be constant...

e.g:
foreach $line (@Path_Filename)
   {
chomp($line);
(@Path_Breakdown) = split(/(\w+\W)(\w+\W)/, $line);
   }

but my biggest problem is how to match a word
character \w then match everything until the last
\...that will comprise of the file path and the final
\ onwards will be the filename incl. or excl. the file
extension...

I've tried to get the pattern matching to include
everything including the \ but it doesn't seem to
work. The closest I've gotten is:

c:\test\abc\what\somefile.txt
c:
\test\abc\what\
somefile.
txt

Any ideas? Is there a pattern character that I'm
missing here that allows you to match a certain
character and then stop if it's the last one of it's
type?

Ben


__
Do you Yahoo!?
New Yahoo! Photos - easier uploading and sharing.
http://photos.yahoo.com/

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Pattern matching problem

2001-09-20 Thread Pete Sergeant


Some other notes...

You don't have to use printf - you can use print. And you don't need the
brackets, or the inverted commas around $path:

if ($path =~ m/^\$/) {
print "Path is env var\n";
} else {
print "Working on phys.dir\n";
}


> > if ( "$path" =~ /^$/ ) {
> >printf("path is env var\n");
> >}
> >else {
> > printf ("working on phys.dir\n");
> >}
> > And it's not working.
> > What's wrong ?
> > Thanks in advance.
> >
> > --
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
>



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Pattern matching problem

2001-09-20 Thread Sudarsan Raghavan

/^$/ matches a blank line, /^\$/ will do the job for you.
$ is a metacharacter, you will have to escape it. It matches at the end of a
line or before newline at the end.

hth.
Sudarsan

Tanya Bar wrote:

>
> Path could be physical or start with  environment  variable; so in my script
> I'm trying to check if the first character of $path is "$";
> I tried it this way :
> if ( "$path" =~ /^$/ ) {
>printf("path is env var\n");
>}
>else {
> printf ("working on phys.dir\n");
>}
> And it's not working.
> What's wrong ?
> Thanks in advance.
>
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Pattern matching problem

2001-09-19 Thread Tanya Bar

Hi, All!
I'm very new in Perl, so may be I'm doing something wrong.
Please help me with this :
my script read configuration file, that looks like this:
#-
#hostname  #username #Path#gzip
after   #delete after
hpn003  ctanya  $PRIVATE_LOG/old_files  15
40
hpn003  ctanya   /var/tmp/tanya* 20
60
# -
Path could be physical or start with  environment  variable; so in my script
I'm trying to check if the first character of $path is "$";
I tried it this way :
if ( "$path" =~ /^$/ ) {
   printf("path is env var\n");
   }
   else {
printf ("working on phys.dir\n");
   }
And it's not working.
What's wrong ?
Thanks in advance.





-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]