subject:"regex problem"

Re: Help me with a regex problem

2019-10-26 Thread Dermot

You might consider using Regexp::Common::net. It provides a convenient set
of functions for matching IP v4, v6 and mac addresses.


https://metacpan.org/pod/Regexp::Common::net

On Fri, 25 Oct 2019 at 19:43, John W. Krahn  wrote:

> On 2019-10-25 3:23 a.m., Maggie Q Roth wrote:
> >   Hello
>
> Hello.
>
> > There are two primary types of lines in the log:
>
> What are those two types?  How do you define them?
>
>
> > 60.191.38.xx/
> > 42.120.161.xx   /archives/1005
>
>  From my point of view those two lines have two fields, the first looks
> like an IP address and the second looks like a file path.  In other
> words I can't distinguish the difference between these two "types".
>
>
> > I know how to write regex to match each line, but don't get the good
> result
> > with one regex to match both lines.
> >
> > Can you help?
>
> Perhaps if you could describe the problem better?
>
>
> John
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>

Re: Help me with a regex problem

2019-10-25 Thread John W. Krahn


On 2019-10-25 3:23 a.m., Maggie Q Roth wrote:

  Hello


Hello.


There are two primary types of lines in the log:


What are those two types?  How do you define them?



60.191.38.xx/
42.120.161.xx   /archives/1005


From my point of view those two lines have two fields, the first looks 
like an IP address and the second looks like a file path.  In other 
words I can't distinguish the difference between these two "types".




I know how to write regex to match each line, but don't get the good result
with one regex to match both lines.

Can you help?


Perhaps if you could describe the problem better?


John

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Help me with a regex problem

2019-10-25 Thread Andy Bach

/(?[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s+(?\/.*)/

To avoid the "leaning toothpick" problem, Perl lets use different match
delimiters, so the above is the same as:
m#(?[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s+(?/.*)#

I assume you want to capture the IP and the path, right?
if ( $entry =~ m#([\d.]+)\s+(/\S+)# ) {
   my ($ip, $path) = ($1, $2);
   print "IP $ip asked for path $path\n";

On Fri, Oct 25, 2019 at 5:28 AM Илья Рассадин  wrote:

> For example, this regex
>
> /(?[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s+(?\/.*)/
>
> On 25.10.2019 13:23, Maggie Q Roth wrote:
> > Hello
> >
> > There are two primary types of lines in the log:
> >
> > 60.191.38.xx/
> > 42.120.161.xx   /archives/1005
> >
> > I know how to write regex to match each line, but don't get the good
> > result with one regex to match both lines.
> >
> > Can you help?
> >
> > Thanks,
> > Maggie
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>

-- 

a

Andy Bach,
afb...@gmail.com
608 658-1890 cell
608 261-5738 wk

Re: Help me with a regex problem

2019-10-25 Thread Benjamin S Pendygraft II

That is a backslash followed by a forward slash. The backslash tells the
regex parser to treat the next character as a literal character. Useful for
matching periods, question marks, brackets, etc.
A period matches any character once and an asterisk matches the previous
character any number of times. .* basically means match everything.

Apologies if this is formatted incorrectly. Sending from my phone.

On Fri, Oct 25, 2019 at 06:37 Maggie Q Roth  wrote:

> what's V.*?
>
> Maggie
>
> On Fri, Oct 25, 2019 at 6:28 PM Илья Рассадин  wrote:
>
>> For example, this regex
>>
>> /(?[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s+(?\/.*)/
>>
>> On 25.10.2019 13:23, Maggie Q Roth wrote:
>> > Hello
>> >
>> > There are two primary types of lines in the log:
>> >
>> > 60.191.38.xx/
>> > 42.120.161.xx   /archives/1005
>> >
>> > I know how to write regex to match each line, but don't get the good
>> > result with one regex to match both lines.
>> >
>> > Can you help?
>> >
>> > Thanks,
>> > Maggie
>>
>> --
>> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
>> For additional commands, e-mail: beginners-h...@perl.org
>> http://learn.perl.org/
>>
>>
>> --
Benjamin Pendygraft

Re: Help me with a regex problem

2019-10-25 Thread X Dungeness

my $n = '[0-9]{1,3}';
if  (  =~ (  m[ (?:$n\.){3} $n \s+ \S+ ]x )
{
   # match
}


On Fri, Oct 25, 2019 at 3:37 AM Maggie Q Roth  wrote:

> what's V.*?
>
> Maggie
>
> On Fri, Oct 25, 2019 at 6:28 PM Илья Рассадин  wrote:
>
>> For example, this regex
>>
>> /(?[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s+(?\/.*)/
>>
>> On 25.10.2019 13:23, Maggie Q Roth wrote:
>> > Hello
>> >
>> > There are two primary types of lines in the log:
>> >
>> > 60.191.38.xx/
>> > 42.120.161.xx   /archives/1005
>> >
>> > I know how to write regex to match each line, but don't get the good
>> > result with one regex to match both lines.
>> >
>> > Can you help?
>> >
>> > Thanks,
>> > Maggie
>>
>> --
>> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
>> For additional commands, e-mail: beginners-h...@perl.org
>> http://learn.perl.org/
>>
>>
>>

Re: Help me with a regex problem

2019-10-25 Thread Maggie Q Roth

what's V.*?

Maggie

On Fri, Oct 25, 2019 at 6:28 PM Илья Рассадин  wrote:

> For example, this regex
>
> /(?[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s+(?\/.*)/
>
> On 25.10.2019 13:23, Maggie Q Roth wrote:
> > Hello
> >
> > There are two primary types of lines in the log:
> >
> > 60.191.38.xx/
> > 42.120.161.xx   /archives/1005
> >
> > I know how to write regex to match each line, but don't get the good
> > result with one regex to match both lines.
> >
> > Can you help?
> >
> > Thanks,
> > Maggie
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>

Re: Help me with a regex problem

2019-10-25 Thread Илья Рассадин


For example, this regex

/(?[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s+(?\/.*)/

On 25.10.2019 13:23, Maggie Q Roth wrote:

Hello

There are two primary types of lines in the log:

60.191.38.xx        /
42.120.161.xx       /archives/1005

I know how to write regex to match each line, but don't get the good 
result with one regex to match both lines.


Can you help?

Thanks,
Maggie


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Help me with a regex problem

2019-10-25 Thread Maggie Q Roth

 Hello

There are two primary types of lines in the log:

60.191.38.xx/
42.120.161.xx   /archives/1005

I know how to write regex to match each line, but don't get the good result
with one regex to match both lines.

Can you help?

Thanks,
Maggie

regex problem?

2015-11-25 Thread Rick T

The following code apparently is not doing what I wanted. My intention was to 
confirm that the general format of  $student_id was this: several uppercase 
letters followed by a hyphen followed by several digits. If not, it would 
trigger the die. Unfortunately it seems to always trigger the die. For example, 
if I let student_id = triplett-1, the script dies. I’m a beginner, so I often 
have trouble seeing the “obvious.” Any suggestions will be appreciated!

if  ( $student_id =~
/
(\A[a-z]+)  # match and capture 
leading alphabetics 
-   # hyphen to 
separate surname from number
([0-9]+\z)  # match and capture 
trailing digits
/xms# Perl Best Practices
) {
$student_surname = $1;
$student_number  = $2;
}
else {
die "Bad general form for student_id: $student_id"
};


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: regex problem?

2015-11-25 Thread Andrew Solomon

The only problem I can see is that you want UPPERCASE-1234 and your regex
has lowercase. Try

(\A[A-Z]+)   # match and capture leading alphabetics


Andrew

p.s Why not add "use strict; use warnings", "my $var;" and wear a seat belt
when you're driving?:)



On Wed, Nov 25, 2015 at 5:09 PM, Rick T  wrote:

> The following code apparently is not doing what I wanted. My intention was
> to confirm that the general format of  $student_id was this: several
> uppercase letters followed by a hyphen followed by several digits. If not,
> it would trigger the die. Unfortunately it seems to always trigger the die.
> For example, if I let student_id = triplett-1, the script dies. I’m a
> beginner, so I often have trouble seeing the “obvious.” Any suggestions
> will be appreciated!
>
> if  ( $student_id =~
> /
> (\A[a-z]+)  # match and
> capture leading alphabetics
> -   # hyphen
> to separate surname from number
> ([0-9]+\z)  # match and
> capture trailing digits
> /xms# Perl Best
> Practices
> ) {
> $student_surname = $1;
> $student_number  = $2;
> }
> else {
> die "Bad general form for student_id: $student_id"
> };
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>


-- 
Andrew Solomon

Mentor@Geekuni http://geekuni.com/
http://www.linkedin.com/in/asolomon

Fwd: regex problem?

2015-11-25 Thread Raj Barath

-- Forwarded message --
From: Raj Barath <barat...@outlook.com<mailto:barat...@outlook.com>>
Date: Wed, Nov 25, 2015 at 1:16 PM
Subject: Re: regex problem?
To: Rick T <p...@reason.net<mailto:p...@reason.net>>

Hi Rick,

You can use split.

For example:
my ( $stud_surname, $stud_number ) = split ( /-/, $student_id );
You are splitting on the hyphen character.

-Raj

On Wed, Nov 25, 2015 at 1:09 PM, Rick T 
<p...@reason.net<mailto:p...@reason.net>> wrote:
The following code apparently is not doing what I wanted. My intention was to 
confirm that the general format of  $student_id was this: several uppercase 
letters followed by a hyphen followed by several digits. If not, it would 
trigger the die. Unfortunately it seems to always trigger the die. For example, 
if I let student_id = triplett-1, the script dies. I’m a beginner, so I often 
have trouble seeing the “obvious.” Any suggestions will be appreciated!

if  ( $student_id =~
/
(\A[a-z]+)  # match and capture 
leading alphabetics
-   # hyphen to 
separate surname from number
([0-9]+\z)  # match and capture 
trailing digits
/xms# Perl Best Practices
) {
$student_surname = $1;
$student_number  = $2;
}
else {
die "Bad general form for student_id: $student_id"
};

--
To unsubscribe, e-mail: 
beginners-unsubscr...@perl.org<mailto:beginners-unsubscr...@perl.org>
For additional commands, e-mail: 
beginners-h...@perl.org<mailto:beginners-h...@perl.org>
http://learn.perl.org/

Re: regex problem?

2015-11-25 Thread Shawn H Corey

On Wed, 25 Nov 2015 17:22:04 +
Andrew Solomon  wrote:

> The only problem I can see is that you want UPPERCASE-1234 and your
> regex has lowercase. Try
> 
> (\A[A-Z]+)   # match and capture leading alphabetics

Please put the anchor outside the capture. And you could use the POSIX
conventions:

m{ \A ([[:upper:]]+) }msx;

This will work with non-English characters. :)


-- 
Don't stop where the ink does.
Shawn

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

A regex problem?

2012-08-13 Thread Owen

I have a web form with a text area that I feed back through a cgi
script and filter the text with;

$q1_elaborate =~ s/[^[:alpha:]' .-]//g;
quotemeta($q1_elaborate);

I admit to doing a google search on perl remove malicious code and
took that code from one of the results.(and not quite understanding
what it does)

However, it removes line feeds as well, so maybe that code is not all
that good.

Just wondering if this would be just as adequate in filtering
malicious code 

$q1_elaborate =~ s/[`\\|!\.\^]//g

TIA

-- 
Owen

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: A regex problem?

2012-08-13 Thread Andy Bach

On Mon, Aug 13, 2012 at 5:42 AM, Owen rc...@pcug.org.au wrote:
 I have a web form with a text area that I feed back through a cgi
 script and filter the text with;

 $q1_elaborate =~ s/[^[:alpha:]' .-]//g;
 quotemeta($q1_elaborate);

 However, it removes line feeds as well, so maybe that code is not all
 that good.

Well the idea is to remove anything that might be bad but whitespace
isn't bad so change that one blank in there for the \s metachar:
$q1_elaborate =~ s/[^[:alpha:]'\s.-]//g;
 quotemeta($q1_elaborate);

The trick here is it's using a character class for the match and the
initial caret (^) negates the class so it means replace anything
that is non-alph, single quote, whitespace, literal period or a dash
with nothing.  However (perldoc -f quotemeta
 quotemeta EXPR
 quotemeta
Returns the value of EXPR with all non-word characters
backslashed.  (That is, all
 characters not matching /[A-Za-z_0-9]/ will be preceded by a backslash in the
 returned string, regardless of any locale settings.)  This is the
internal function
 implementing the \Q escape in double-quoted strings.

The key there being returns - so I believe you'd want
$q1_elaborate = quotemeta($q1_elaborate);

Finally, while it probably doesn't matter here, IMNSHO, you should
check your matching and react accordingly. If $q1_elaborate has one of
the non-valid chars, do you care?
if ( $q1_elaborate =~ s/[^[:alpha:]'\s.-]//g ) {
# if appropriate
warn(Non-valid chars in q1_elaborate\n);
}
$q1_elaborate =  quotemeta($q1_elaborate);

Again, not a big gain here, but as a rule of thumb - doing your
match/subst in an if or if/else will give you a more robust program.

-- 

a

Andy Bach,
afb...@gmail.com
608 658-1890 cell
608 261-5738 wk

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

regex problem

2010-11-05 Thread jm

i have csv files in the following format, where some fields are
enclosed in double quotes if they have commas embedded in them and all
other fields are simply comma-delimited without any encapsulation,
such as

 some,data,more,data,numbers,etc,data with a , in the
datastream,yet more data,possibly more embedded ,'s,and,so,on,,,

changing the formatting of the source file to enclose all fields in
double quotes is not an option.  i'm trying to figure out a regex,
split, or some other functionality that will allow me to either

1. wrap each 'bare' field in double quotes (ignoring the embedded
commas in the encapsulated fields)or
2. extract each field, automatically determining if commas should be
ignored inside double quotes

i know it should be relatively simple but i'm not yet fluent enough in
regex to grasp the necessary double quote exceptions.  any help is
greatly appreciated.

tia,
joe

-- 
since this is a gmail account, please verify the mailing list is
included in the reply to addresses

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: regex problem

2010-11-05 Thread Shawn H Corey


On 10-11-05 09:34 AM, jm wrote:

i have csv files in the following format, where some fields are
enclosed in double quotes if they have commas embedded in them and all
other fields are simply comma-delimited without any encapsulation


The best way to deal with CSV is to use a module from CPAN.

Text::CVS  http://search.cpan.org/~makamaka/Text-CSV-1.20/lib/Text/CSV.pm

Text::CSV_XS  http://search.cpan.org/~hmbrand/Text-CSV_XS-0.76/CSV_XS.pm


--
Just my 0.0002 million dollars worth,
  Shawn

Programming is as much about organization and communication
as it is about coding.

The secret to great software:  Fail early  often.

Eliminate software piracy:  use only FLOSS.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: regex problem

2010-11-05 Thread Robert Wohlfarth

On Fri, Nov 5, 2010 at 8:34 AM, jm jm5...@gmail.com wrote:

 changing the formatting of the source file to enclose all fields in
 double quotes is not an option.  i'm trying to figure out a regex,
 split, or some other functionality that will allow me to either

 1. wrap each 'bare' field in double quotes (ignoring the embedded
 commas in the encapsulated fields)or
 2. extract each field, automatically determining if commas should be
 ignored inside double quotes


Try the Text::CSV
modulehttp://search.cpan.org/%7Emakamaka/Text-CSV-1.20/lib/Text/CSV.pm.
It handles all of these details for you.

-- 
Robert Wohlfarth

Re: regex problem

2010-11-05 Thread jm

i appreciate the tips.  unfortunately, adding modules to this server
is not currently possible.  does anyone have a more 'hands-on'
solution?


On Fri, Nov 5, 2010 at 8:53 AM, Shawn H Corey shawnhco...@gmail.com wrote:
 On 10-11-05 09:34 AM, jm wrote:

 i have csv files in the following format, where some fields are
 enclosed in double quotes if they have commas embedded in them and all
 other fields are simply comma-delimited without any encapsulation

 The best way to deal with CSV is to use a module from CPAN.

 Text::CVS  http://search.cpan.org/~makamaka/Text-CSV-1.20/lib/Text/CSV.pm

 Text::CSV_XS  http://search.cpan.org/~hmbrand/Text-CSV_XS-0.76/CSV_XS.pm


 --
 Just my 0.0002 million dollars worth,
  Shawn

 Programming is as much about organization and communication
 as it is about coding.

 The secret to great software:  Fail early  often.

 Eliminate software piracy:  use only FLOSS.

 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/






-- 
since this is a gmail account, please verify the mailing list is
included in the reply to addresses

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

RE: regex problem

2010-11-05 Thread Ken Slater

From: jm [mailto:jm5...@gmail.com] 
Sent: Friday, November 05, 2010 10:21 AM

i appreciate the tips.  unfortunately, adding modules to this server
is not currently possible.  does anyone have a more 'hands-on'
solution?

Take a look at the Text::ParseWords module. I believe it should be
installed. perldoc Text::ParseWords.
I have used it for similar problems in the past.
Ken

On Fri, Nov 5, 2010 at 8:53 AM, Shawn H Corey shawnhco...@gmail.com
wrote:
 On 10-11-05 09:34 AM, jm wrote:

 i have csv files in the following format, where some fields are
 enclosed in double quotes if they have commas embedded in them and all
 other fields are simply comma-delimited without any encapsulation

 The best way to deal with CSV is to use a module from CPAN.

 Text::CVS  http://search.cpan.org/~makamaka/Text-CSV-1.20/lib/Text/CSV.pm

 Text::CSV_XS  http://search.cpan.org/~hmbrand/Text-CSV_XS-0.76/CSV_XS.pm


 Just my 0.0002 million dollars worth,
  Shawn






-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Regex problem

2009-12-26 Thread Owen


To check the date passed with a script, I first check that the date
is in the format 20dd (20 followed by 6 digits exactly)

But the regex is wrong, tried /^20\d{6}/,/^20\d{6,6}?/,/^20\d{6,}?/ and
while a 7 or lesser digit number fails, eg 2009101, a 9 digit number,
like 200910103 does not fail.



unless ( $ARGV[2] =~ /^20\d{6}?/ ) { print
$ARGV[2]\tdate format is MMDD, eg 20091031\n; }


How do I get the regex to fail a 9 digit number


I suppose as a work around, I could say;

unless ((length($ARGV[2]) == 8) and ( $ARGV[2] =~ /^20/){fail}

TIA


Owen

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Regex problem

2009-12-26 Thread Chris Charley



- Original Message - 
From: Owen rc...@pcug.org.au

Newsgroups: perl.beginners

Hello Owen




To check the date passed with a script, I first check that the date
is in the format 20dd (20 followed by 6 digits exactly)

But the regex is wrong, tried /^20\d{6}/,/^20\d{6,6}?/,/^20\d{6,}?/ and
while a 7 or lesser digit number fails, eg 2009101, a 9 digit number,
like 200910103 does not fail.



unless ( $ARGV[2] =~ /^20\d{6}?/ ) { print




unless ( $ARGV[2] =~ /^20\d{6}$/) ...

If the end of line anchor is used, '$', the regex will accept an 8 digit 
number if it's the only entry in $ARGV[2]


Chris


$ARGV[2]\tdate format is MMDD, eg 20091031\n; }


How do I get the regex to fail a 9 digit number


I suppose as a work around, I could say;

unless ((length($ARGV[2]) == 8) and ( $ARGV[2] =~ /^20/){fail}

TIA


Owen 



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Regex problem

2009-12-21 Thread jbl

I have a lengthy list of data that I read in. I have substituted a one
line example using __DATA__.
The desired output would be
91416722243rd St

I am getting this as output

91416722rd St   - just the rd St

The capturing reference on (\s)..$1

is not working

# Intent
# Look for 243 preceded by any white space, followed by a space char
# Capture the whitespace as $1
# Replace with whatever the leading whitespace was, then the number,
then the suffix rd and then the trailing space char

Basically add the suffix rd to the number 243, ie...243rd
I can do something else but I was wondering what I am doing wrong here
Thanks
jbl


#!/usr/bin/perl -w
 use strict;

open MY_OUTPUT_FILE,  Export_Output_mod.txt or die Can't write to
out.txt: $!;

 while ( defined ( my $line = DATA ) ) {
   $line =~ s/(\s)243 /$1243rd /g;
   print MY_OUTPUT_FILE $line;
   }

close MY_OUTPUT_FILE;

__END__
91416722243 St


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Regex problem

2009-12-21 Thread Shawn H Corey

jbl wrote:
 I have a lengthy list of data that I read in. I have substituted a one
 line example using __DATA__.
 The desired output would be
 91416722  243rd St
 
 I am getting this as output
 
 91416722rd St   - just the rd St
 
 The capturing reference on (\s)..$1
 
 is not working
 
 # Intent
 # Look for 243 preceded by any white space, followed by a space char
 # Capture the whitespace as $1
 # Replace with whatever the leading whitespace was, then the number,
 then the suffix rd and then the trailing space char
 
 Basically add the suffix rd to the number 243, ie...243rd
 I can do something else but I was wondering what I am doing wrong here
 Thanks
 jbl
 
 
 #!/usr/bin/perl -w
  use strict;
 
 open MY_OUTPUT_FILE,  Export_Output_mod.txt or die Can't write to
 out.txt: $!;
 
  while ( defined ( my $line = DATA ) ) {
$line =~ s/(\s)243 /$1243rd /g;

 $line =~ s/(\s)243 /${1}243rd /g;


print MY_OUTPUT_FILE $line;
}
 
 close MY_OUTPUT_FILE;
 
 __END__
 91416722  243 St
 
 


-- 
Just my 0.0002 million dollars worth,
  Shawn

Programming is as much about organization and communication
as it is about coding.

I like Perl; it's the only language where you can bless your
thingy.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Regex problem

2009-12-21 Thread Robert Wohlfarth

On Mon, Dec 21, 2009 at 9:11 AM, jbl jbl...@gmail.com wrote:

 The desired output would be
 91416722243rd St

 I am getting this as output

 91416722rd St   - just the rd St

 snip

 while ( defined ( my $line = DATA ) ) {
   $line =~ s/(\s)243 /$1243rd /g;
   print MY_OUTPUT_FILE $line;
   }


Try this: $line =~ s/(\s)243 /${1}243rd /g;

Without the braces, Perl is looking for match number 1,243! Braces separate
the 1 from the 243.

-- 
Robert Wohlfarth

Re: Regex problem

2009-03-11 Thread howa

Hi,

On Mar 11, 1:16 am, nore...@gunnar.cc (Gunnar Hjalmarsson) wrote:

 I would do:

      if ( $a =~ /\.(?:html|jpg)$/i )

 Please readhttp://perldoc.perl.org/perlretut.htmland other appropriate
 docs.

Read the doc, but how to negate the Non-capturing groupings ?

use strict;

my $a = 'a.gif';

if ($a =~ /^(?:html|jpg)/gi) {
print 'not html or jpg';

}

Thanks,


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Regex problem, #.*# on new line

2009-03-11 Thread Brent Clark


Hiya

I got a string like so, and for the likes of me I can get regex to have 
it that each line is starts with #abc#.


my $a = 
#aaa#message:details;extra:info;variable:times;#bbb#message:details;extra:info;variable:times;#ccc#not:always;the:same;ts:14:00.00;;

$a =~ s/(?!#.#)/$1\n/i;

Im so despertate i even tried something silly as

join( \n, split(/#.*#/, 
#aaa#message:details;extra:info;variable:times;#bbb#message:details;extra:info;variable:times;#ccc#not:always;the:same;ts:14:00.00;))


if anyone can help, it would so appreciated.

Kind Regards
Brent Clark

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Regex problem

2009-03-11 Thread Jim Gibson

On 3/10/09 Tue  Mar 10, 2009  8:41 PM, howa howac...@gmail.com
scribbled:

 Hi,
 
 On Mar 11, 1:16 am, nore...@gunnar.cc (Gunnar Hjalmarsson) wrote:
 
 I would do:
 
      if ( $a =~ /\.(?:html|jpg)$/i )
 
 Please readhttp://perldoc.perl.org/perlretut.htmland other appropriate
 docs.
 
 Read the doc, but how to negate the Non-capturing groupings ?
 
 use strict;
 
 my $a = 'a.gif';
 
 if ($a =~ /^(?:html|jpg)/gi) {
 print 'not html or jpg';
 
 }

That will test if $a starts with 'html' or 'jpg'. To test for a non-match,
use the !~ operator:

If( $a !~ /(?:htm|jpg)$/gi ) {
  print not html or jpg\n;
}



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Regex problem

2009-03-11 Thread howa

Hello,

On Mar 12, 12:34 am, jimsgib...@gmail.com (Jim Gibson) wrote:
 That will test if $a starts with 'html' or 'jpg'. To test for a non-match,
 use the !~ operator:


I can't, since I will add more criteria into the regex,

e.g.

I need to match a.* , except a.html or a.jpg

 if ( $a =~ /a\.(?:html|jpg)$/i ) # of course this one does not work.


Thanks.


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Regex problem

2009-03-11 Thread Chas. Owens

On Wed, Mar 11, 2009 at 12:53, howa howac...@gmail.com wrote:
 Hello,

 On Mar 12, 12:34 am, jimsgib...@gmail.com (Jim Gibson) wrote:
 That will test if $a starts with 'html' or 'jpg'. To test for a non-match,
 use the !~ operator:


 I can't, since I will add more criteria into the regex,

 e.g.

 I need to match a.* , except a.html or a.jpg

  if ( $a =~ /a\.(?:html|jpg)$/i ) # of course this one does not work.
snip

You want a zero-width-negative-look-ahead:

#!/usr/bin/perl

use strict;
use warnings;

my @a = qw/a.html a.jpg a.gif/;

for my $s (@a) {
print $s , $s =~ /a[.](?!html|jpg)/ ? matches : does not match,
\n;
}

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Regex problem, #.*# on new line

2009-03-11 Thread John W. Krahn


Brent Clark wrote:

Hiya


Hello,

I got a string like so, and for the likes of me I can get regex to have 
it that each line is starts with #abc#.


my $a = 
#aaa#message:details;extra:info;variable:times;#bbb#message:details;extra:info;variable:times;#ccc#not:always;the:same;ts:14:00.00;; 


$a =~ s/(?!#.#)/$1\n/i;


You are using a zero-width negative look-behind assertion which does not 
capture its contents.  You are using a pattern that matches a total of 
three characters but you say you want to match five characters.  You are 
using the /i option but there are no characters in the pattern that are 
affected by the /i option.


You probably want something like:

$x =~ s/(#...#[^#]*)/$1\n/g;



John
--
Those people who think they know everything are a great
annoyance to those of us who do.-- Isaac Asimov

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Regex problem

2009-03-10 Thread howa

Hello,

Consider the code:
#===

use strict;

my $a = 'a.jpg';

if ($a =~ /(html|jpg)/gi) {
print 'ok';
}

#===


Is the brucket () must be needed? Since I am not using back
reference, are there a better way?

Thanks.


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Regex problem

2009-03-10 Thread Jim Gibson

On 3/10/09 Tue  Mar 10, 2009  8:19 AM, howa howac...@gmail.com
scribbled:

 Hello,
 
 Consider the code:
 #===
 
 use strict;
 
 my $a = 'a.jpg';
 
 if ($a =~ /(html|jpg)/gi) {
 print 'ok';
 }
 
 #===
 
 
 Is the brucket () must be needed? Since I am not using back
 reference, are there a better way?

No, the parentheses are not need in this simple case. The pattern
/html|jpg/i will work fine (you don't need the 'g' modifier since you are
only looking for one match).

However, if you want other elements in your pattern, you may need
parentheses to group sub-elements. For example, if you wanted to match only
if the 'html' or 'jpg' were at the end of the string, then /html|jpg$/ will
not work, as this pattern will match 'html' anywhere in the string. You will
have to use /html$|jpg$/ or /(html|jpg)$/. Non-capturing parentheses can be
used for clustering without capturing, as in /(?:html|jpg)$/.

You can make your regexs a little more readable with the 'x' modifier:

/ html | jpg /ix



 



-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Regex problem

2009-03-10 Thread Chas. Owens

On Tue, Mar 10, 2009 at 11:19, howa howac...@gmail.com wrote:
 Hello,

 Consider the code:
 #===

 use strict;

 my $a = 'a.jpg';

 if ($a =~ /(html|jpg)/gi) {
    print 'ok';
 }

 #===


 Is the brucket () must be needed? Since I am not using back
 reference, are there a better way?
snip

Since you have no other patterns in the regex you do not need the
parentheses.  If you had other patterns and wanted to avoid the
slowdown associated with backreferences you could use the
group-non-capturing parentheses:

/foo[.](?:html|jpg)/

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Regex problem

2009-03-10 Thread Gunnar Hjalmarsson


howa wrote:

Hello,

Consider the code:
#===

use strict;

my $a = 'a.jpg';

if ($a =~ /(html|jpg)/gi) {
print 'ok';
}

#===


Is the brucket () must be needed?


Parentheses. What happened when you tried without them? And why the /g 
modifier?



Since I am not using back reference, are there a better way?


I would do:

if ( $a =~ /\.(?:html|jpg)$/i )

Please read http://perldoc.perl.org/perlretut.html and other appropriate 
docs.


--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

RE: Simple regex problem has me baffled

2009-01-27 Thread Bill Harpley

Hi Shawn,


Here is the revised code fragment:


open ( DATA,  $INBOX/nlsrysows001_2090125.dat) || die Cannot open
source file: $!;
open ( FILE,  $INBOX/request.dat) || die Cannot open request file:
$!;

chomp(@list=DATA);

foreach $entry(@list)
{

$entry =~ /\[([a-z0-9]{5})\]/;

$req_id=$1;


print $req_id\n;
}

But I still get errors !!




8252c
Use of uninitialized value in concatenation (.) or string at ./magic.pl
line 19, DATA line 1044.

8252c
Use of uninitialized value in concatenation (.) or string at ./magic.pl
line 19, DATA line 1044.

8252d
Use of uninitialized value in concatenation (.) or string at ./magic.pl
line 19, DATA line 1044.

8252d
Use of uninitialized value in concatenation (.) or string at ./magic.pl
line 19, DATA line 1044.



What is especially puzzling is that I have seen notation such as 'print
$1\n;' in other scripts.

Regards,

Bill Harpley
















 

-Original Message-
From: Mr. Shawn H. Corey [mailto:shawnhco...@magma.ca] 
Sent: Monday, January 26, 2009 4:32 PM
To: Bill Harpley
Cc: beginners@perl.org
Subject: Re: Simple regex problem has me baffled

On Mon, 2009-01-26 at 16:20 +0100, Bill Harpley wrote:
 foreach $entry(@list)
 {
 
 $entry =~ /\[([a-z0-9]{5})\]/;
 
 print $1\n;   # print to screen
 
 # print FILE $1\n;# print to file
 }

If there is no match, you are printing a uninitialized value; try:

foreach my $entry ( @list ){
  if( $entry =~ m{ \[ ( [a-z0-9]{5} ) \] }msx ){
my $request_id = $1;
# ...
  }
}


--
Just my 0.0002 million dollars worth,
  Shawn

It would appear that we have reached the limits of what it is  possible
to achieve with computer technology, although one should  be careful
with such statements, as they tend to sound pretty silly  in 5 years.
   --John von Neumann, circa 1960


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

RE: Simple regex problem has me baffled

2009-01-27 Thread Bill Harpley

Hi John,

Thanks for your advice. 

(1) I actually have 'use warnings' enabled in the complete script.
'use strict' just gives me a load of unrelated compilation errors
like
Global symbol $INBOX requires explicit package name at
./magic.pl line 8.

(2) I have also tried a WHILE loop but the result is the same 

(3) the hex digits in the Request_Id all have A-F in lower case (so
there is only the range a-z)
However, it does no harm to put this in, just in case it changes in
the future. So I have made this change

 (4) I tried [[:xdigit:]] but to no avail

So I remain stuck at square one !!

Regards,

Bill



-Original Message-
From: John W. Krahn [mailto:jwkr...@shaw.ca] 
Sent: Monday, January 26, 2009 5:20 PM
To: Perl Beginners
Subject: Re: Simple regex problem has me baffled

Bill Harpley wrote:
 Hello,

Hello,

 I have simple regex problem that is driving me crazy.
 
 I am writing a script to analyse a log file. It contains Java related 
 information about requests and responses.
 
 Each pair of Request (REQ) and Response (RES) calls have a unique 
 Request ID. This is a 5 digit hex number contained in square brackets 
 (e.g.  [81c2d] ).
 
 Using timestamps in each log entry, I need to calculate the time 
 difference between the start of the Request and the end of the
Response.
 
 As a first step, I thought I would identify the matching REQ/RES pairs

 in the log and then set about extracting the timestamp information and

 doing the calculations.
 
 I started with a simple script to extract the Request IDs from each 
 log entry. Here is what one looks like (names have been changed to 
 protect the innocent).
 
 
 [2009-01-23 09:20:48,719]TRACE [server-1] [http-80-5] 
 a...@mydomain.net
 :090123-092048567:f5825 (SetCallForwardStatusImpl.java:call:54) - 
 RequestId [81e80] SetCallForwardStatus.REQ { accountNumber:=W12345, 
 phoneNumber:=12121212121, onBusyStatus:=true, busyCurrent:=voicemail, 
 onNoAnswerStatus:=false, noAnswerCurent:=voicemail, 
 onUncondStatus:=false, uncondCurrent:=voicemail }
 
 So I need to extract the 5 hex digits in RequestId [81e80]. Sounds 
 simple, eh?
 
 Here is a fragment of my initial script:

You should have the warnings and strict pragmas at the beginning of your
program to let perl help you find mistakes:

use warnings;
use strict;


 open ( DATA,  $INBOX/sample.log) || die Cannot open source file:
 $!;
 open ( FILE,  $INBOX/request.dat) || die Cannot open request file:
 $!;
 
 chomp(@list=DATA);
 
 foreach $entry(@list)
 {

It looks like you don't really have to read the entire file into memory
in order to process it.  You should perhaps use a while loop instead
which will only read one line at a time:

while ( my $entry = DATA ) {
 chomp $entry;

And you may not need to chomp the current line if you are not accessing
the data at the end of the line.


 $entry =~ /\[([a-z0-9]{5})\]/;

You are looking for hexadecimal digits so you want either [a-fA-F0-9] or

[[:xdigit:]] instead.


 print $1\n; # print to screen

The contents of $1 are only valid if the regular expression matched 
successfully, otherwise $1 retains the contents from the previously 
successful match.

 if ( $entry =~ /RequestId\s+\[([a-fA-F0-9]{5})\]/ ) {
 print $1\n;
 }


 # print FILE $1\n;  # print to file
 }



John
-- 
Those people who think they know everything are a great
annoyance to those of us who do.-- Isaac Asimov

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

RE: Simple regex problem has me baffled

2009-01-27 Thread Bill Harpley

Hi Gunnar,

I tried your suggestions but had no luck :-(

(1)  I tried your idea of using a paragraph separator


 local $/ = '';  # paragraph mode
 while ( my $entry = DATA ) {
 if ( $entry =~ /\[([a-z0-9]{5})]/ ) {
 print $1\n;
 }
 }

 But the only output which got was :
   
# script.pl
8252c

So it found the first line and then quit. So the separator is
obviously the usual \n;


At some point, I was planning to convert the long wrapped
lines into a single long line, to make the later timestamp analysis
easier.
This is how the event records appear in the log:


[2009-01-25 02:21:13,760]TRACE [server-1] [http-80-12]
u...@mydomain.net:090125-022113763:4c213
(LimitVoIPLineImpl.java:call:54)
;- RequestId [8252c] LimitVoIPLine.REQ { accountNumber:=W1931627,
phoneNumber:=1234512345 }
;[2009-01-25 02:21:22,104]TRACE [server-1] [http-80-12]
u...@mydomain.net:090125-022113763:4c213
(LimitVoIPLineImpl.java:call:57)
;- RequestId [8252c] LimitVoIPLine.RES { LimitVoIPLine Result {
Result:=Success } }
;[2009-01-25 02:21:34,675]TRACE [server-1] [http-80-20]
u...@mydomain.net:090125-022134678:467d0
(LimitVoIPLineImpl.java:call:54)
;- RequestId [8252d] LimitVoIPLine.REQ { accountNumber:=W1931627,
phoneNumber:=31455491773 }
;[2009-01-25 02:21:41,354]TRACE [server-1] [http-80-20]
u...@mydomain.net:090125-022134678:467d0
(LimitVoIPLineImpl.java:call:57)
;- RequestId [8252d] LimitVoIPLine.RES { LimitVoIPLine Result {
Result:=Success } }
;[2009-01-25 09:26:27,148]TRACE [server-1] [http-80-8]
u...@mydomain.net:090125-092627068:48de4
;(GetCallForwardStatusImpl.java:call:52) - RequestId [82534]
GetCallForwardStatus.REQ { accountNumber:=W1576824,
phoneNumber:=1234512345
;}
;[2009-01-25 09:26:27,153]TRACE [server-1] [http-80-12]
u...@mydomain.net:090125-092627077:5d89f
;(GetRestrictionListImpl.java:call:53) - RequestId [82535]
GetRestrictionList.REQ { accountNumber:=W1576824,
phoneNumber:=1234512345 }


 So a single event record can be split across several lines
( I assume this is not just a terminal wrap problem). 

Is this what you mean when you said that Probably because your
code splits each entry into multiple @list elements.

Would it be better to convert each record into a single long
line before trying to perform regex match? Is there an easy way to do
this?

 

 
Regards,
 
Bill Harpley





-Original Message-
From: Gunnar Hjalmarsson [mailto:nore...@gunnar.cc]
Sent: Monday, January 26, 2009 5:22 PM
To: beginners@perl.org
Subject: Re: Simple regex problem has me baffled

Bill Harpley wrote:

 [2009-01-23 09:20:48,719]TRACE [server-1] [http-80-5]
 a...@mydomain.net
 :090123-092048567:f5825 (SetCallForwardStatusImpl.java:call:54) -
 RequestId [81e80] SetCallForwardStatus.REQ { accountNumber:=W12345,
 phoneNumber:=12121212121, onBusyStatus:=true, busyCurrent:=voicemail,
 onNoAnswerStatus:=false, noAnswerCurent:=voicemail,
 onUncondStatus:=false, uncondCurrent:=voicemail }

Is an entry divided into multiple lines? If so, and if the entries are
separated by one or more empty lines, you probably want to enable
paragraph mode.

http://perldoc.perl.org/perlvar.html#$INPUT_RECORD_SEPARATOR

 chomp(@list=DATA);

It seems to be unnecessary to read the whole log file into an array.
chomp()ing seems to be unnecessary, too.

 $entry =~ /\[([a-z0-9]{5})\]/;

You'd better check whether the regex matches.

 local $/ = '';  # paragraph mode
 while ( my $entry = DATA ) {
 if ( $entry =~ /\[([a-z0-9]{5})]/ ) {
 print $1\n;
 }
 }

 The first thing that puzzles me is that it obviously extracting the
 RequestId substring correctly, it seems to complain about the $1\n
 expression in line 16.
 This looks quite OK to me and I am baffled why I am getting this
 message.

Probably because your code splits each entry into multiple @list
elements.

 The other thing that puzzles me is that there can only be a single
 REQ/RES pair in the file with a given ID. So the RequestID should not
 appear more than twice in the The output list. Yet there are many
 instances where the RequestID appears more than twice.

$1 retains its value from the latest successful match until the next
time the regex matches successfully.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional
commands, e-mail: beginners-h...@perl.org http://learn.perl.org/

RE: Simple regex problem has me baffled

2009-01-27 Thread Bill Harpley

Rob,

Thanks for your suggestion.  It worked!!

# script.pl
8252c
8252c
8252d
8252d
82534
82535
82535
82534
8253c
8253c
8253f
8253f
82542
82543
- big long list -

So this is what did the trick:

  while (DATA) {
next unless /RequestId \[([[:xdigit:]]+)\]/;
print $1\n;
  }

Can you explain why this works but my orginal effort did not?

Many thanks,

Bill Harpley



-Original Message-
From: Rob Dixon [mailto:rob.di...@gmx.com] 
Sent: Monday, January 26, 2009 7:19 PM
To: Perl Beginners
Cc: Bill Harpley
Subject: Re: Simple regex problem has me baffled

Bill Harpley wrote:
 Hello,
 
 I have simple regex problem that is driving me crazy.
 
 I am writing a script to analyse a log file. It contains Java related 
 information about requests and responses.
 
 Each pair of Request (REQ) and Response (RES) calls have a unique 
 Request ID. This is a 5 digit hex number contained in square brackets 
 (e.g.  [81c2d] ).
 
 Using timestamps in each log entry, I need to calculate the time 
 difference between the start of the Request and the end of the
Response.
 
 As a first step, I thought I would identify the matching REQ/RES pairs

 in the log and then set about extracting the timestamp information and

 doing the calculations.
 
 I started with a simple script to extract the Request IDs from each 
 log entry. Here is what one looks like (names have been changed to 
 protect the innocent).
 
 
 [2009-01-23 09:20:48,719]TRACE [server-1] [http-80-5] 
 a...@mydomain.net
 :090123-092048567:f5825 (SetCallForwardStatusImpl.java:call:54) - 
 RequestId [81e80] SetCallForwardStatus.REQ { accountNumber:=W12345, 
 phoneNumber:=12121212121, onBusyStatus:=true, busyCurrent:=voicemail, 
 onNoAnswerStatus:=false, noAnswerCurent:=voicemail, 
 onUncondStatus:=false, uncondCurrent:=voicemail }
 
 So I need to extract the 5 hex digits in RequestId [81e80]. Sounds 
 simple, eh?
 
 Here is a fragment of my initial script:
 
 open ( DATA,  $INBOX/sample.log) || die Cannot open source file:
 $!;
 open ( FILE,  $INBOX/request.dat) || die Cannot open request file:
 $!;
 
 chomp(@list=DATA);
 
 foreach $entry(@list)
 {
 
 $entry =~ /\[([a-z0-9]{5})\]/;
 
 print $1\n; # print to screen
 
 # print FILE $1\n;  # print to file
 }
 
 I have spent quite a bit of time refining this expression and it looks

 OK to me. I basically just need to extract the 5-digit hex string and 
 then write it to a file (or to screen).
 
 This is what I get when I run the script:
 
 Use of uninitialized value in concatenation (.) or string at 
 ./magic.pl line 16, DATA line 1044.
 
 8252c
 Use of uninitialized value in concatenation (.) or string at 
 ./magic.pl line 16, DATA line 1044.
 
 8252c
 Use of uninitialized value in concatenation (.) or string at 
 ./magic.pl line 16, DATA line 1044.
 
 8252d
 Use of uninitialized value in concatenation (.) or string at 
 ./magic.pl line 16, DATA line 1044.
 
 8252d
 Use of uninitialized value in concatenation (.) or string at 
 ./magic.pl line 16, DATA line 1044.
 
 82534
 82534
 Use of uninitialized value in concatenation (.) or string at 
 ./magic.pl line 16, DATA line 1044.
 
 82535
 Use of uninitialized value in concatenation (.) or string at 
 ./magic.pl line 16, DATA line 1044.
 
 82534
 82534
 82534
 Use of uninitialized value in concatenation (.) or string at 
 ./magic.pl line 16, DATA line 1044.
 
 8253c
 8253c
 8253c
 Use of uninitialized value in concatenation (.) or string at 
 ./magic.pl line 16, DATA line 1044
 
 
  --- Big long list --note that RequestIDs from REQ/RES pairs need not

 be adjacent in the list -- 
 
 The first thing that puzzles me is that it obviously extracting the 
 RequestId substring correctly, it seems to complain about the $1\n
 expression in line 16.
 This looks quite OK to me and I am baffled why I am getting this 
 message.
 
 The other thing that puzzles me is that there can only be a single 
 REQ/RES pair in the file with a given ID. So the RequestID should not 
 appear more than twice in the The output list. Yet there are many 
 instances where the RequestID appears more than twice.
 
 Any help you guys can provide would be much appreciated. The Perl 
 version is 5.8.4. on solaris 10

I think I would write

  while (DATA) {
next unless /RequestId \[([[:xdigit:]]+)\]/;
print $1\n;
  }

HTH,

Rob

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Simple regex problem has me baffled

2009-01-27 Thread Gunnar Hjalmarsson


Bill Harpley wrote:

Hi Gunnar,

I tried your suggestions but had no luck :-(

(1)  I tried your idea of using a paragraph separator


 local $/ = '';  # paragraph mode
 while ( my $entry = DATA ) {
 if ( $entry =~ /\[([a-z0-9]{5})]/ ) {
 print $1\n;
 }
 }

 But the only output which got was :
   
# script.pl
8252c


So it found the first line and then quit. So the separator is
obviously the usual \n;


So it seems.


At some point, I was planning to convert the long wrapped
lines into a single long line, to make the later timestamp analysis
easier.
This is how the event records appear in the log:

[2009-01-25 02:21:13,760]TRACE [server-1] [http-80-12]
u...@mydomain.net:090125-022113763:4c213
(LimitVoIPLineImpl.java:call:54)
;- RequestId [8252c] LimitVoIPLine.REQ { accountNumber:=W1931627,
phoneNumber:=1234512345 }
;[2009-01-25 02:21:22,104]TRACE [server-1] [http-80-12]
u...@mydomain.net:090125-022113763:4c213
(LimitVoIPLineImpl.java:call:57)
;- RequestId [8252c] LimitVoIPLine.RES { LimitVoIPLine Result {
Result:=Success } }
;[2009-01-25 02:21:34,675]TRACE [server-1] [http-80-20]
u...@mydomain.net:090125-022134678:467d0
(LimitVoIPLineImpl.java:call:54)
;- RequestId [8252d] LimitVoIPLine.REQ { accountNumber:=W1931627,
phoneNumber:=31455491773 }
;[2009-01-25 02:21:41,354]TRACE [server-1] [http-80-20]
u...@mydomain.net:090125-022134678:467d0
(LimitVoIPLineImpl.java:call:57)
;- RequestId [8252d] LimitVoIPLine.RES { LimitVoIPLine Result {
Result:=Success } }
;[2009-01-25 09:26:27,148]TRACE [server-1] [http-80-8]
u...@mydomain.net:090125-092627068:48de4
;(GetCallForwardStatusImpl.java:call:52) - RequestId [82534]
GetCallForwardStatus.REQ { accountNumber:=W1576824,
phoneNumber:=1234512345
;}
;[2009-01-25 09:26:27,153]TRACE [server-1] [http-80-12]
u...@mydomain.net:090125-092627077:5d89f
;(GetRestrictionListImpl.java:call:53) - RequestId [82535]
GetRestrictionList.REQ { accountNumber:=W1576824,
phoneNumber:=1234512345 }

 So a single event record can be split across several lines
( I assume this is not just a terminal wrap problem). 


Is this what you mean when you said that Probably because your
code splits each entry into multiple @list elements.


Yes.


Would it be better to convert each record into a single long
line before trying to perform regex match?


Well, it might make the next steps easier, but at first hand we ought to 
let Perl do the job, right?


Even if paragraph mode is not applicable, since you are going to 
analyze the log file, somehow it makes sense to separate the log entries 
from each other. Now when I know a little more about the structure of 
the log, this is what I would try next:


local $/ = }\n;;
while ( my $entry = DATA ) {
if ( $entry =~ /\[([a-z0-9]{5})]/ ) {
print $1\n;
}
}

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Simple regex problem has me baffled

2009-01-26 Thread Bill Harpley

Hello,

I have simple regex problem that is driving me crazy.

I am writing a script to analyse a log file. It contains Java related
information about requests and responses.

Each pair of Request (REQ) and Response (RES) calls have a unique
Request ID. This is a 5 digit hex number contained in square brackets
(e.g.  [81c2d] ).

Using timestamps in each log entry, I need to calculate the time
difference between the start of the Request and the end of the Response.

As a first step, I thought I would identify the matching REQ/RES pairs
in the log and then set about extracting the timestamp information and
doing the calculations.

I started with a simple script to extract the Request IDs from each log
entry. Here is what one looks like (names have been changed to protect
the innocent).


[2009-01-23 09:20:48,719]TRACE [server-1] [http-80-5] a...@mydomain.net
:090123-092048567:f5825 (SetCallForwardStatusImpl.java:call:54) -
RequestId [81e80] SetCallForwardStatus.REQ { accountNumber:=W12345,
phoneNumber:=12121212121, onBusyStatus:=true, busyCurrent:=voicemail,
onNoAnswerStatus:=false, noAnswerCurent:=voicemail,
onUncondStatus:=false, uncondCurrent:=voicemail }

So I need to extract the 5 hex digits in RequestId [81e80]. Sounds
simple, eh?

Here is a fragment of my initial script:

open ( DATA,  $INBOX/sample.log) || die Cannot open source file:
$!;
open ( FILE,  $INBOX/request.dat) || die Cannot open request file:
$!;

chomp(@list=DATA);

foreach $entry(@list)
{

$entry =~ /\[([a-z0-9]{5})\]/;

print $1\n;   # print to screen

# print FILE $1\n;# print to file
}

I have spent quite a bit of time refining this expression and it looks
OK to me. I basically just need to extract the 5-digit hex string and
then write it to a file (or to screen).

This is what I get when I run the script:

Use of uninitialized value in concatenation (.) or string at ./magic.pl
line 16, DATA line 1044.

8252c
Use of uninitialized value in concatenation (.) or string at ./magic.pl
line 16, DATA line 1044.

8252c
Use of uninitialized value in concatenation (.) or string at ./magic.pl
line 16, DATA line 1044.

8252d
Use of uninitialized value in concatenation (.) or string at ./magic.pl
line 16, DATA line 1044.

8252d
Use of uninitialized value in concatenation (.) or string at ./magic.pl
line 16, DATA line 1044.

82534
82534
Use of uninitialized value in concatenation (.) or string at ./magic.pl
line 16, DATA line 1044.

82535
Use of uninitialized value in concatenation (.) or string at ./magic.pl
line 16, DATA line 1044.

82534
82534
82534
Use of uninitialized value in concatenation (.) or string at ./magic.pl
line 16, DATA line 1044.

8253c
8253c
8253c
Use of uninitialized value in concatenation (.) or string at ./magic.pl
line 16, DATA line 1044


 --- Big long list --note that RequestIDs from REQ/RES pairs need not
be adjacent in the list -- 

The first thing that puzzles me is that it obviously extracting the
RequestId substring correctly, it seems to complain about the $1\n
expression in line 16.
This looks quite OK to me and I am baffled why I am getting this
message.

The other thing that puzzles me is that there can only be a single
REQ/RES pair in the file with a given ID. So the RequestID should not
appear more than twice in the
The output list. Yet there are many instances where the RequestID
appears more than twice.

Any help you guys can provide would be much appreciated. The Perl
version is 5.8.4. on solaris 10


Regards,

Bill Harpley

Re: Simple regex problem has me baffled

2009-01-26 Thread Mr. Shawn H. Corey

On Mon, 2009-01-26 at 16:20 +0100, Bill Harpley wrote:
 foreach $entry(@list)
 {
 
 $entry =~ /\[([a-z0-9]{5})\]/;
 
 print $1\n;   # print to screen
 
 # print FILE $1\n;# print to file
 }

If there is no match, you are printing a uninitialized value; try:

foreach my $entry ( @list ){
  if( $entry =~ m{ \[ ( [a-z0-9]{5} ) \] }msx ){
my $request_id = $1;
# ...
  }
}


-- 
Just my 0.0002 million dollars worth,
  Shawn

It would appear that we have reached the limits of what it is
 possible to achieve with computer technology, although one should
 be careful with such statements, as they tend to sound pretty silly
 in 5 years.
   --John von Neumann, circa 1960


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Simple regex problem has me baffled

2009-01-26 Thread John W. Krahn


Bill Harpley wrote:

Hello,


Hello,


I have simple regex problem that is driving me crazy.

I am writing a script to analyse a log file. It contains Java related
information about requests and responses.

Each pair of Request (REQ) and Response (RES) calls have a unique
Request ID. This is a 5 digit hex number contained in square brackets
(e.g.  [81c2d] ).

Using timestamps in each log entry, I need to calculate the time
difference between the start of the Request and the end of the Response.

As a first step, I thought I would identify the matching REQ/RES pairs
in the log and then set about extracting the timestamp information and
doing the calculations.

I started with a simple script to extract the Request IDs from each log
entry. Here is what one looks like (names have been changed to protect
the innocent).


[2009-01-23 09:20:48,719]TRACE [server-1] [http-80-5] a...@mydomain.net
:090123-092048567:f5825 (SetCallForwardStatusImpl.java:call:54) -
RequestId [81e80] SetCallForwardStatus.REQ { accountNumber:=W12345,
phoneNumber:=12121212121, onBusyStatus:=true, busyCurrent:=voicemail,
onNoAnswerStatus:=false, noAnswerCurent:=voicemail,
onUncondStatus:=false, uncondCurrent:=voicemail }

So I need to extract the 5 hex digits in RequestId [81e80]. Sounds
simple, eh?

Here is a fragment of my initial script:


You should have the warnings and strict pragmas at the beginning of your 
program to let perl help you find mistakes:


use warnings;
use strict;



open ( DATA,  $INBOX/sample.log) || die Cannot open source file:
$!;
open ( FILE,  $INBOX/request.dat) || die Cannot open request file:
$!;

chomp(@list=DATA);

foreach $entry(@list)
{


It looks like you don't really have to read the entire file into memory 
in order to process it.  You should perhaps use a while loop instead 
which will only read one line at a time:


while ( my $entry = DATA ) {
chomp $entry;

And you may not need to chomp the current line if you are not accessing 
the data at the end of the line.




$entry =~ /\[([a-z0-9]{5})\]/;


You are looking for hexadecimal digits so you want either [a-fA-F0-9] or 
[[:xdigit:]] instead.




print $1\n; # print to screen


The contents of $1 are only valid if the regular expression matched 
successfully, otherwise $1 retains the contents from the previously 
successful match.


if ( $entry =~ /RequestId\s+\[([a-fA-F0-9]{5})\]/ ) {
print $1\n;
}



# print FILE $1\n;  # print to file
}




John
--
Those people who think they know everything are a great
annoyance to those of us who do.-- Isaac Asimov

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Simple regex problem has me baffled

2009-01-26 Thread Gunnar Hjalmarsson


Bill Harpley wrote:


[2009-01-23 09:20:48,719]TRACE [server-1] [http-80-5] a...@mydomain.net
:090123-092048567:f5825 (SetCallForwardStatusImpl.java:call:54) -
RequestId [81e80] SetCallForwardStatus.REQ { accountNumber:=W12345,
phoneNumber:=12121212121, onBusyStatus:=true, busyCurrent:=voicemail,
onNoAnswerStatus:=false, noAnswerCurent:=voicemail,
onUncondStatus:=false, uncondCurrent:=voicemail }


Is an entry divided into multiple lines? If so, and if the entries are 
separated by one or more empty lines, you probably want to enable 
paragraph mode.


http://perldoc.perl.org/perlvar.html#$INPUT_RECORD_SEPARATOR


chomp(@list=DATA);


It seems to be unnecessary to read the whole log file into an array. 
chomp()ing seems to be unnecessary, too.



$entry =~ /\[([a-z0-9]{5})\]/;


You'd better check whether the regex matches.

local $/ = '';  # paragraph mode
while ( my $entry = DATA ) {
if ( $entry =~ /\[([a-z0-9]{5})]/ ) {
print $1\n;
}
}


The first thing that puzzles me is that it obviously extracting the
RequestId substring correctly, it seems to complain about the $1\n
expression in line 16.
This looks quite OK to me and I am baffled why I am getting this
message.


Probably because your code splits each entry into multiple @list elements.


The other thing that puzzles me is that there can only be a single
REQ/RES pair in the file with a given ID. So the RequestID should not
appear more than twice in the
The output list. Yet there are many instances where the RequestID
appears more than twice.


$1 retains its value from the latest successful match until the next 
time the regex matches successfully.


--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Simple regex problem has me baffled

2009-01-26 Thread Rob Dixon

Bill Harpley wrote:
 Hello,
 
 I have simple regex problem that is driving me crazy.
 
 I am writing a script to analyse a log file. It contains Java related
 information about requests and responses.
 
 Each pair of Request (REQ) and Response (RES) calls have a unique
 Request ID. This is a 5 digit hex number contained in square brackets
 (e.g.  [81c2d] ).
 
 Using timestamps in each log entry, I need to calculate the time
 difference between the start of the Request and the end of the Response.
 
 As a first step, I thought I would identify the matching REQ/RES pairs
 in the log and then set about extracting the timestamp information and
 doing the calculations.
 
 I started with a simple script to extract the Request IDs from each log
 entry. Here is what one looks like (names have been changed to protect
 the innocent).
 
 
 [2009-01-23 09:20:48,719]TRACE [server-1] [http-80-5] a...@mydomain.net
 :090123-092048567:f5825 (SetCallForwardStatusImpl.java:call:54) -
 RequestId [81e80] SetCallForwardStatus.REQ { accountNumber:=W12345,
 phoneNumber:=12121212121, onBusyStatus:=true, busyCurrent:=voicemail,
 onNoAnswerStatus:=false, noAnswerCurent:=voicemail,
 onUncondStatus:=false, uncondCurrent:=voicemail }
 
 So I need to extract the 5 hex digits in RequestId [81e80]. Sounds
 simple, eh?
 
 Here is a fragment of my initial script:
 
 open ( DATA,  $INBOX/sample.log) || die Cannot open source file:
 $!;
 open ( FILE,  $INBOX/request.dat) || die Cannot open request file:
 $!;
 
 chomp(@list=DATA);
 
 foreach $entry(@list)
 {
 
 $entry =~ /\[([a-z0-9]{5})\]/;
 
 print $1\n; # print to screen
 
 # print FILE $1\n;  # print to file
 }
 
 I have spent quite a bit of time refining this expression and it looks
 OK to me. I basically just need to extract the 5-digit hex string and
 then write it to a file (or to screen).
 
 This is what I get when I run the script:
 
 Use of uninitialized value in concatenation (.) or string at ./magic.pl
 line 16, DATA line 1044.
 
 8252c
 Use of uninitialized value in concatenation (.) or string at ./magic.pl
 line 16, DATA line 1044.
 
 8252c
 Use of uninitialized value in concatenation (.) or string at ./magic.pl
 line 16, DATA line 1044.
 
 8252d
 Use of uninitialized value in concatenation (.) or string at ./magic.pl
 line 16, DATA line 1044.
 
 8252d
 Use of uninitialized value in concatenation (.) or string at ./magic.pl
 line 16, DATA line 1044.
 
 82534
 82534
 Use of uninitialized value in concatenation (.) or string at ./magic.pl
 line 16, DATA line 1044.
 
 82535
 Use of uninitialized value in concatenation (.) or string at ./magic.pl
 line 16, DATA line 1044.
 
 82534
 82534
 82534
 Use of uninitialized value in concatenation (.) or string at ./magic.pl
 line 16, DATA line 1044.
 
 8253c
 8253c
 8253c
 Use of uninitialized value in concatenation (.) or string at ./magic.pl
 line 16, DATA line 1044
 
 
  --- Big long list --note that RequestIDs from REQ/RES pairs need not
 be adjacent in the list -- 
 
 The first thing that puzzles me is that it obviously extracting the
 RequestId substring correctly, it seems to complain about the $1\n
 expression in line 16.
 This looks quite OK to me and I am baffled why I am getting this
 message.
 
 The other thing that puzzles me is that there can only be a single
 REQ/RES pair in the file with a given ID. So the RequestID should not
 appear more than twice in the
 The output list. Yet there are many instances where the RequestID
 appears more than twice.
 
 Any help you guys can provide would be much appreciated. The Perl
 version is 5.8.4. on solaris 10

I think I would write

  while (DATA) {
next unless /RequestId \[([[:xdigit:]]+)\]/;
print $1\n;
  }

HTH,

Rob

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Regex problem with accented characters

2007-03-27 Thread Beginner

Hi,

I am trying to extract the iso code and country name from a 3 column
table (taken from en.wikipedia.org) and have noticed a problem with
accented characters such as Ô.

Below is my script and a sample of the data I am using. When I run
the script the code beginning CI for Côte d'Ivoire returns the string

CI\tC where as I had hoped for CI\tCôte d'Ivoire

Does anyone know why \w+ does include Côte d'Ivoire and how I can get
around it in future?

TIA,
Dp.


 extract.pl 
#!/usr/bin/perl

use strict;
use warnings;

my $file = 'iso-alpha2.txt';

open(FH,$file) or die Can't open $file: $!\n;
while (FH) {
chomp;
next if ($_ !~ /^\w{2}\s+/);
my ($code,$name) = ($_ =~
/^(\w{2})\s+(\w+\s\w+\s\w+s\w+|\w+\s\w+\s\w+|\w+\s\w+|\w+)/);
print $code\t$name\n;
}
===

 sample data 
...snip
BY  Belarus Previously named Byelorussian S.S.R.
BZ  Belize
CA  Canada
CC  Cocos (Keeling) Islands
CD  Congo, the Democratic Republic of the   Previously named Zaire
ZR
CF  Central African Republic
CG  Congo
CH  Switzerland Code taken from Confoederatio Helvetica, its
official Latin name
CI  Côte d'Ivoire
CK  Cook Islands
CL  Chile
CM  Cameroon
===

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Regex problem with accented characters

2007-03-27 Thread Mumia W.


On 03/27/2007 03:34 AM, Beginner wrote:

Hi,

I am trying to extract the iso code and country name from a 3 column 
table (taken from en.wikipedia.org) and have noticed a problem with 
accented characters such as Ô.


Below is my script and a sample of the data I am using. When I run 
the script the code beginning CI for Côte d'Ivoire returns the string


CI\tC where as I had hoped for CI\tCôte d'Ivoire

Does anyone know why \w+ does include Côte d'Ivoire and how I can get 
around it in future?


TIA,
Dp.


 extract.pl 
#!/usr/bin/perl

use strict;
use warnings;

my $file = 'iso-alpha2.txt';

open(FH,$file) or die Can't open $file: $!\n;
while (FH) {
chomp;
next if ($_ !~ /^\w{2}\s+/);
	my ($code,$name) = ($_ =~ 
/^(\w{2})\s+(\w+\s\w+\s\w+s\w+|\w+\s\w+\s\w+|\w+\s\w+|\w+)/);

print $code\t$name\n;
}
===

 sample data 
...snip
BY  Belarus Previously named Byelorussian S.S.R.
BZ  Belize  
CA  Canada  
CC  Cocos (Keeling) Islands 
CD 	Congo, the Democratic Republic of the 	Previously named Zaire 
ZR

CF  Central African Republic
CG  Congo   
CH 	Switzerland 	Code taken from Confoederatio Helvetica, its 
official Latin name

CI  Côte d'Ivoire   
CK  Cook Islands
CL  Chile   
CM 	Cameroon 
===




It's partly the encoding. Put «use encoding iso-8859-1;» at the top of 
your program, and there will be a little improvement. However, that only 
gets you as far as Côte d; I doubt there is any encoding where 
apostrophe is in \w.


It's probably best to create an expression that contains all of the 
characters you may want. That would include accented characters and the 
apostrophe in this case.


Also, I advise you to use an programmer's editor that supports syntax 
highlighting. My VIM shows me that you missed the backslash that is 
supposed to be on the fourth \s in your regular expression.




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Regex problem with accented characters

2007-03-27 Thread Rob Dixon


Beginner wrote:

Hi,

I am trying to extract the iso code and country name from a 3 column 
table (taken from en.wikipedia.org) and have noticed a problem with 
accented characters such as Ô.


Below is my script and a sample of the data I am using. When I run 
the script the code beginning CI for Côte d'Ivoire returns the string


CI\tC where as I had hoped for CI\tCôte d'Ivoire

Does anyone know why \w+ does include Côte d'Ivoire and how I can get 
around it in future?


TIA,
Dp.


 extract.pl 
#!/usr/bin/perl

use strict;
use warnings;

my $file = 'iso-alpha2.txt';

open(FH,$file) or die Can't open $file: $!\n;
while (FH) {
chomp;
next if ($_ !~ /^\w{2}\s+/);
my ($code,$name) = ($_ =~ 
/^(\w{2})\s+(\w+\s\w+\s\w+s\w+|\w+\s\w+\s\w+|\w+\s\w+|\w+)/);
print $code\t$name\n;
}
===

 sample data 
...snip
BY  Belarus Previously named Byelorussian S.S.R.
BZ  Belize  
CA  Canada  
CC  Cocos (Keeling) Islands 
CD 	Congo, the Democratic Republic of the 	Previously named Zaire 
ZR

CF  Central African Republic
CG  Congo   
CH  Switzerland Code taken from Confoederatio Helvetica, its official 
Latin name
CI  Côte d'Ivoire   
CK  Cook Islands
CL  Chile   
CM 	Cameroon 
===


Ordinarily the range of characters mapped by \w is limited to [0-9A-Za-z_].
However, if you put 'use locale' at the start of your program this will be
extended to include the accented alpha characters as well (see perldoc
perllocale).

However, this will still not solve your problem, as the apostrophe in
Côte d'Ivoire will still not match \w and you will end up with
CI\tCôte d. I suggest you change your regex to simply match any
character at all up to the end of the line, like this:

 while (FH) {
   chomp;
   next unless /^(\w\w)\s+(.+?)\s*$/;
   my ($code, $name) = ($1, $2);
   print $code\t$name\n;
 }

which will give the result you desire.

But you still have the problem that the line for Zaire has no text and
will not match the regex anyway!

Hope this helps.

Rob

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: Regex problem with accented characters

2007-03-27 Thread Rob Dixon


Beginner wrote:


/^(\w{2})\s+(\w+\s\w+\s\w+s\w+|\w+\s\w+\s\w+|\w+\s\w+|\w+)/);


It's worth noting that this could be written:

/^(\w{2})\s+(\w+(?:\s\w+)*)/);

Rob

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: strange regex problem with backslash and newline

2006-08-05 Thread Tom Phoenix


On 8/5/06, Peter Daum [EMAIL PROTECTED] wrote:


$s='abc \
';



$s =~ /^(.*[^\\])(\\)?$/; print 1: '$1', 2: '$2';


Let's see what that pattern matches by annotating it:

 m{
   ^   # start of string
   (   # memory 1
 .*# any ol' junk, including backslashes
 [^\\] # any non-backslash, including newlines
   )
   (\\)?   # optional backslash (memory 2)
   $   # end of string (or final newline at eos)
 }x


I would expect $1 to hold abc  and $2==\\, but instead,
the first grouping  holds everything including the backslash
and the following newline, while $2 is left undefined.



the . obviously matched the newline at the end.


No, the . matched the backslash; the [^\\] matched the newline.

Does that get you back on the right track? Hope this helps!

--Tom Phoenix
Stonehenge Perl Training

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: strange regex problem with backslash and newline

2006-08-05 Thread Peter Daum

Tom Phoenix wrote:
 On 8/5/06, Peter Daum [EMAIL PROTECTED] wrote:

 $s =~ /^(.*[^\\])(\\)?$/; print 1: '$1', 2: '$2';
 
 Let's see what that pattern matches by annotating it:
 
  m{
^   # start of string
(   # memory 1
  .*# any ol' junk, including backslashes
  [^\\] # any non-backslash, including newlines

... h ;-)

I somehow had always assumed, that not only .
but also other constructs (like the [^\\] which really
was intended as [^\\\n]) treat the newline special and
only \n or $ match the newline - certainly not
something the Perl documentation says anywhere,
but this was the first time I ever had a situation
where this makes a difference.

Thanks a lot for the explanation!

Regards,
Peter Daum


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: strange regex problem with backslash and newline

2006-08-05 Thread John W. Krahn

Peter Daum wrote:
 Hi,

Hello,

 when trying to process continuation lines in a file, I ran
 into a weird phenomenon that I can't make any sense of:
 
 $s contains a line read from a file, that ends with a backslash
 (+ the newline character), so
 
 $s='abc \
 ';
 
 $s =~ /^(.*)$/; print $1; # prints abc \ as expected

If what you really want to do is put all the continuation lines on the same
line then you can do it something like this:


while ( my $s = FILE ) {

if ( $s =~ s/\\\n/ / ) {

$s .= FILE;

redo;
}

# process complete line
}



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

XML Parsing error - regex problem?

2006-03-10 Thread Graeme McLaren


Hi all, I'm getting the following XML parsing error:

[Fri Mar 10 09:37:39 2006] insert_xml.pl: not well-formed (invalid token) at 
line 13628, column 24, byte 413248:

[Fri Mar 10 09:37:39 2006] insert_xml.pl: laLA14/la
[Fri Mar 10 09:37:39 2006] insert_xml.pl: seed5741726/seed
[Fri Mar 10 09:37:39 2006] insert_xml.pl: school_nameSt. Patricks R.C. 
P.S./school_name

[Fri Mar 10 09:37:39 2006] insert_xml.pl: ===^
[Fri Mar 10 09:37:39 2006] insert_xml.pl: councilFalkirk/council
[Fri Mar 10 09:37:39 2006] insert_xml.pl: ceCE-511 (Edge)/ce
[Fri Mar 10 09:37:39 2006] insert_xml.pl:  at 
/usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi/XML/Parser.pm line 
185



I've checked my XML file and it contains:


school_nameSt. Patrick92s R.C. P.S./school_name

This is because St. Patrick's contains an apostrophe.  I have a couple of 
regexes to handle ampersands and apostrophes, however the apostrophe regex 
doesn't appear to be working correctly:



ampersand regex works:

$data-[$i] =~ s//#38;/g;


apostrophe regex doesn't work:

$data-[$i] =~ s/'/apos;/g;


Any ideas on this one?

G :)

P.S. Thank you to all who replied to my previous post, I got that array 
dereferenced properly.




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

RE: XML Parsing error - regex problem?

2006-03-10 Thread Graeme McLaren

Hi all, I've worked out that the character is a type of apostrophe which has 
a hex value of 92.  How would I write my regex to substitute this character 
for a normal apostrophe?


I've tried: s/92/'/g;

and it didn't work.


Any ideas?





From: Graeme McLaren [EMAIL PROTECTED]
To: beginners@perl.org
Subject: XML Parsing error - regex problem?
Date: Fri, 10 Mar 2006 10:03:50 +
MIME-Version: 1.0
X-Originating-IP: [212.250.155.249]
X-Originating-Email: [EMAIL PROTECTED]
X-Sender: [EMAIL PROTECTED]
Received: from lists.develooper.com ([63.251.223.186]) by 
bay0-mc10-f2.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); Fri, 10 
Mar 2006 03:40:24 -0800

Received: (qmail 30267 invoked by uid 514); 10 Mar 2006 10:08:22 -
Received: (qmail 29736 invoked from network); 10 Mar 2006 10:05:11 -
Received: from x1a.develooper.com (HELO x1.develooper.com) (216.52.237.111) 
 by lists.develooper.com with SMTP; 10 Mar 2006 10:05:11 -

Received: (qmail 634 invoked by uid 225); 10 Mar 2006 10:04:02 -
Received: (qmail 626 invoked by alias); 10 Mar 2006 10:04:01 -
Received: pass (x1.develooper.com: domain of [EMAIL PROTECTED] 
designates 64.4.56.20 as permitted sender)
Received: from bay101-f10.bay101.hotmail.com (HELO hotmail.com) 
(64.4.56.20)by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Fri, 10 
Mar 2006 02:03:56 -0800
Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; 
Fri, 10 Mar 2006 02:03:51 -0800
Received: from 64.4.56.200 by by101fd.bay101.hotmail.msn.com with HTTP;Fri, 
10 Mar 2006 10:03:50 GMT

X-Message-Info: JGTYoYF78jEHjJx36Oi8+Z3TmmkSEdPt4iogl2abg+M=
Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm
Precedence: bulk
List-Post: mailto:beginners@perl.org
List-Help: mailto:[EMAIL PROTECTED]
List-Unsubscribe: mailto:[EMAIL PROTECTED]
List-Subscribe: mailto:[EMAIL PROTECTED]
List-Id: beginners.perl.org
Delivered-To: mailing list beginners@perl.org
Delivered-To: beginners@perl.org
X-Spam-Status: No, hits=-0.7 
required=8.0tests=BAYES_00,DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,MSGID_FROM_MTA_HEADER,SPF_HELO_PASS,SPF_PASS

X-Spam-Check-By: la.mx.develooper.com
X-OriginalArrivalTime: 10 Mar 2006 10:03:51.0759 (UTC) 
FILETIME=[EECFEDF0:01C64429]

Return-Path: [EMAIL PROTECTED]

Hi all, I'm getting the following XML parsing error:

[Fri Mar 10 09:37:39 2006] insert_xml.pl: not well-formed (invalid token) 
at line 13628, column 24, byte 413248:

[Fri Mar 10 09:37:39 2006] insert_xml.pl: laLA14/la
[Fri Mar 10 09:37:39 2006] insert_xml.pl: seed5741726/seed
[Fri Mar 10 09:37:39 2006] insert_xml.pl: school_nameSt. Patricks R.C. 
P.S./school_name

[Fri Mar 10 09:37:39 2006] insert_xml.pl: ===^
[Fri Mar 10 09:37:39 2006] insert_xml.pl: councilFalkirk/council
[Fri Mar 10 09:37:39 2006] insert_xml.pl: ceCE-511 (Edge)/ce
[Fri Mar 10 09:37:39 2006] insert_xml.pl:  at 
/usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi/XML/Parser.pm line 
185



I've checked my XML file and it contains:


school_nameSt. Patrick92s R.C. P.S./school_name

This is because St. Patrick's contains an apostrophe.  I have a couple of 
regexes to handle ampersands and apostrophes, however the apostrophe regex 
doesn't appear to be working correctly:



ampersand regex works:

$data-[$i] =~ s///g;


apostrophe regex doesn't work:

$data-[$i] =~ s/'/apos;/g;


Any ideas on this one?

G :)

P.S. Thank you to all who replied to my previous post, I got that array 
dereferenced properly.




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response






--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: XML Parsing error - regex problem?

2006-03-10 Thread Tom Phoenix

On 3/10/06, Graeme McLaren [EMAIL PROTECTED] wrote:

 I've checked my XML file and it contains:

 school_nameSt. Patrick92s R.C. P.S./school_name

 This is because St. Patrick's contains an apostrophe.

I'm guessing that where I see four characters 92, the actual file
has a single character. Some tools render unusual characters that way.

 I have a couple of
 regexes to handle ampersands and apostrophes, however the apostrophe regex
 doesn't appear to be working correctly:


 ampersand regex works:

 $data-[$i] =~ s///g;

I'm not sure I know what you mean by works. It seems to be replacing
every ampersand with an ampersand in the target string, which would be
a no-op if it didn't have side effects.

 apostrophe regex doesn't work:

 $data-[$i] =~ s/'/apos;/g;

It doesn't? It's probably matching any true apostrophes.

 I've worked out that the character is a type of apostrophe which has
 a hex value of 92.  How would I write my regex to substitute this character
 for a normal apostrophe?

 I've tried: s/92/'/g;

 and it didn't work.

I think you're looking for one of these:

s/\x92/'/g
s/\x92/apos;/g
tr/\x92/'/

Backslash escapes are documented in perlop. Hope this helps!

--Tom Phoenix
Stonehenge Perl Training

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: regex problem

2006-02-16 Thread Jay Savage

On 2/15/06, anand kumar [EMAIL PROTECTED] wrote:


 John W. Krahn [EMAIL PROTECTED] wrote:anand kumar wrote:
  Hi all,

 Hello,

  I have the following problem in the following regex replace.
 
  $line=~s!\b($name)\b!$1!g;
 
  here this regex finds the exact matching of the content in $name and does
  the needed but in some examples the variable $name may contain backslash
  characters like 'gene\l=s\' , in this type of cases the replace string does
  not work so i have removed '\b' on either side and used the following
 
  $line=~s!(\Q$name\E)!$1!g;
 
  This works fine but the problem is that the replacement is not done on the
  exact word but also on substrings which is unnecessary.
 
  if i use both \b\b and \Q\E then the code fails to replace.
 
  please send suggestions in this regard

 $line=~s!\b(\Q$name\E)\b!$1!g;

Try this: if $name is a single-quoted string:

$name = quotemeta($name);
$line =~ s|($name)|au$1|;

If $name is a double-quoted string:

   $name = quotemeta(quotemeta($name));
   $line =~ s|($name)|au$1|;

It's preferable, though for $name to be single-quoted, because Perl
will do some interpolation at the time the string is saved, and
depending on your system, strange things can happen. For instance, the
following are not all equal:

$name = quotemeta(\aball); # $name gets '\\x07ball'
$name = '\aball';# $name gets '\aball'
$name = \aball; $name = quotemeta($name);
# $name gets '\\x07ball'

This is because the double-quoted string is interpolated before it is
assigned to a variable or passed to a function and the
metacharacter--in this case '\a', the escape sequence for the ASCII
bell character--is already interpolated.

HTH,

-- jay
--
This email and attachment(s): [  ] blogable; [ x ] ask first; [  ]
private and confidential

daggerquill [at] gmail [dot] com
http://www.tuaw.com  http://www.dpguru.com  http://www.engatiki.org

values of β will give rise to dom!

Re: regex problem

2006-02-15 Thread anand kumar



John W. Krahn [EMAIL PROTECTED] wrote:anand kumar wrote:
 Hi all,

Hello,

 I have the following problem in the following regex replace.
 
 $line=~s!\b($name)\b!$1!g;
 
 here this regex finds the exact matching of the content in $name and does
 the needed but in some examples the variable $name may contain backslash
 characters like 'gene\l=s\' , in this type of cases the replace string does
 not work so i have removed '\b' on either side and used the following
 
 $line=~s!(\Q$name\E)!$1!g;
 
 This works fine but the problem is that the replacement is not done on the
 exact word but also on substrings which is unnecessary.
 
 if i use both \b\b and \Q\E then the code fails to replace.
 
 please send suggestions in this regard

$line=~s!\b(\Q$name\E)\b!$1!g;



John
-- 
use Perl;
program
fulfillment
   
  Hi john,
   
   i have tried the above method but the replace ment is done .
  regards
  anand


-
 Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.

regex problem

2006-02-14 Thread anand kumar

Hi all,
   
   I have the following problem in the following regex replace. 
   
  $line=~s!\b($name)\b!au$1!g;
   
  here this regex finds the exact matching of the content in $name and does the 
needed but in some examples the variable $name may contain backslash characters 
like 'gene\l=s\' , in this type of cases the replace string does not work so i 
have removed '\b' on either side and used the following
   
  $line=~s!(\Q$name\E)!au$1!g;
   
  This works fine but the problem is that the replacement is not done on the 
exact word but also on substrings which is unnecessary.
   
  if i use both \b\b and \Q\E then the code fails to replace.
   
  please send suggestions in this regard
   
  Thanks in advance for the help
   
  Regards
  Anand
   


-
 Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.

Re: regex problem

2006-02-14 Thread John W. Krahn

anand kumar wrote:
 Hi all,

Hello,

 I have the following problem in the following regex replace.
 
 $line=~s!\b($name)\b!au$1!g;
 
 here this regex finds the exact matching of the content in $name and does
 the needed but in some examples the variable $name may contain backslash
 characters like 'gene\l=s\' , in this type of cases the replace string does
 not work so i have removed '\b' on either side and used the following
 
 $line=~s!(\Q$name\E)!au$1!g;
 
 This works fine but the problem is that the replacement is not done on the
 exact word but also on substrings which is unnecessary.
 
 if i use both \b\b and \Q\E then the code fails to replace.
 
 please send suggestions in this regard

$line=~s!\b(\Q$name\E)\b!au$1!g;



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Regex Problem.

2005-08-18 Thread Sara

I am at a loss here to generate REGEX for my problem.

I have an input query coming to my cgi script, containg a word (with or without 
spaces e.g. blood Globin Test etc).
What I am trying to do is to split this word (maximum of 3 characters) and find 
the BEST possible matching words within mySQL database. For example if the word 
is blood

I want to get results using regex: 

for blood: check(blo) then check(loo)  check(ood)
for Globin Test: check(Glo) then check(lob)  check(obi) check(bin) 
check(Tes) check(est)

TIA.

Sara.

sub check {
my $check = $dbh - prepare(SELECT * FROM medical WHERE def LIKE '%$query%' );
$check-execute();
while (my @row = $check - fetchrow_array()) {
print blah blah blah\n;
}
}

Re: Regex Problem.

2005-08-18 Thread Wiggins d'Anconia

Sara wrote:
 I am at a loss here to generate REGEX for my problem.
 
 I have an input query coming to my cgi script, containg a word (with or 
 without spaces e.g. blood Globin Test etc).
 What I am trying to do is to split this word (maximum of 3 characters) and 
 find the BEST possible matching words within mySQL database. For example if 
 the word is blood
 
 I want to get results using regex: 
 
 for blood: check(blo) then check(loo)  check(ood)
 for Globin Test: check(Glo) then check(lob)  check(obi) check(bin) 
 check(Tes) check(est)
 
 TIA.


Sounds like you need a split then a substr rather than a regex,
though I suppose it would work if you really wanted one, I wouldn't.

perldoc -f split
perldoc -f substr

It will also be faster to combine everything into one select rather than
for each possible token, but at the least if you are going to do
multiple selects use 'prepare' with placeholders and only prepare the
query once.

So,

-- UNTESTED --

my @tokens = split ' ', $entry;
my @words;
foreach my $token (@tokens) {
  push @words, substr $token, 0, 3;
  push @words, substr $token, -3, 3;
}

(or you can put the following into the above foreach however you would like)

my $where = '';
my @bind;
foreach my $word (@words) {
  $where .= ' OR ' if $where ne '';
  $where .= (def LIKE ?);
  push @bind, %$word%;
}

my $sth = $dbh-prepare(SELECT * FROM medical WHERE $where);
$sth-execute(@bind);

while (my @row = $sth-fetchrow_array) {
  print join ' ', @row;
  print \n;
}

This also prevents SQL injection by quoting the query words properly.

 Sara.


http://danconia.org

 sub check {
 my $check = $dbh - prepare(SELECT * FROM medical WHERE def LIKE '%$query%' 
 );
 $check-execute();
 while (my @row = $check - fetchrow_array()) {
 print blah blah blah\n;
 }
 }
 

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: Regex Problem.

2005-08-18 Thread Sara


That's worked like a charm, You ALL are great.

Thanks everyone for help.

Sara.


- Original Message - 
From: [EMAIL PROTECTED]

To: 'Sara' [EMAIL PROTECTED]
Sent: Thursday, August 18, 2005 10:50 PM
Subject: RE: Regex Problem.



Hi Sara,

what is about somthing like
$string = 'blood';
for($i=0; $i=length($string)-3;$i++) {

check(substr($string,$i,3));
}



Mit freundlichen Grüssen
   Ihr echtwahr.Webmaster


http://www.echtwahr.de
http://www.echtwahr.com

-Original Message-
From: Sara [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 18, 2005 5:48 PM
To: beginners-cgi@perl.org
Subject: Regex Problem.

I am at a loss here to generate REGEX for my problem.

I have an input query coming to my cgi script, containg a word (with or
without spaces e.g. blood Globin Test etc).
What I am trying to do is to split this word (maximum of 3 characters) 
and

find the BEST possible matching words within mySQL database. For example
if the word is blood

I want to get results using regex:

for blood: check(blo) then check(loo)  check(ood)
for Globin Test: check(Glo) then check(lob)  check(obi)
check(bin) check(Tes) check(est)

TIA.

Sara.

sub check {
my $check = $dbh - prepare(SELECT * FROM medical WHERE def LIKE
'%$query%' );
$check-execute();
while (my @row = $check - fetchrow_array()) {
print blah blah blah\n;
}
}







--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

regex problem

2005-07-01 Thread Moon, John

The following is not returning what I had expected...

SUN1-BATCHperl -e '$a=q{/var/run}; $home=q{/var/123};print Yes - $a like
$home\n if $a =~ /^$home/;'
SUN1-BATCHperl -e '$a=q{/var/run}; $home=q{/var/ra};print Yes - $a like
$home\n if $a =~ /^$home/;'
SUN1-BATCHperl -e '$a=q{/var/run}; $home=q{/var/ru};print Yes - $a like
$home\n if $a =~ /^$home/;'
Yes - /var/run like /var/ru


I would have assumed  that /var/run would NOT be like /var/ru just as
/var/run is not like /var/ra... 

John W Moon


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: regex problem

2005-07-01 Thread Chris Devers

On Fri, 1 Jul 2005, Moon, John wrote:

 The following is not returning what I had expected...
 
 $a= q{/var/run};
 $home = q{/var/ru};
 print Yes - $a like $home\n if $a =~ /^$home/;
 
 I would have assumed  that /var/run would NOT be like /var/ru just 
 as /var/run is not like /var/ra...

It depends what you mean by like.

In this case, the string in $home also appears as part of $a, so in that 
sense there are alike, and the code is doing the right thing.

If you want to verify that $a and $home are identical, it would be 
easier to just check if one `eq` the other, as

print Yes - $a like $home\n if $a eq $home;

That test will work if $home is '/var/run', but will fail on '/var/ru'. 


-- 
Chris Devers

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: regex problem

2005-07-01 Thread Ing. Branislav Gerzo

Moon, John [MJ], on Friday, July 1, 2005 at 11:30 (-0400 ) contributed
this to our collective wisdom:

MJ I would have assumed  that /var/run would NOT be like /var/ru just as
MJ /var/run is not like /var/ra...

is /var/ru at the beginning of /var/run ? yes.

-- 

 ...m8s, cu l8r, Brano.

[If they get too annoying then we'll just have to get violent...]



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: regex problem

2005-07-01 Thread Jay Savage

On 7/1/05, Moon, John [EMAIL PROTECTED] wrote:
 The following is not returning what I had expected...
 
 SUN1-BATCHperl -e '$a=q{/var/run}; $home=q{/var/123};print Yes - $a like
 $home\n if $a =~ /^$home/;'
 SUN1-BATCHperl -e '$a=q{/var/run}; $home=q{/var/ra};print Yes - $a like
 $home\n if $a =~ /^$home/;'
 SUN1-BATCHperl -e '$a=q{/var/run}; $home=q{/var/ru};print Yes - $a like
 $home\n if $a =~ /^$home/;'
 Yes - /var/run like /var/ru
 
 
 I would have assumed  that /var/run would NOT be like /var/ru just as
 /var/run is not like /var/ra...
 
 John W Moon
 

John

A regex match checks to see if the specified pattern appears in the
specified string. And the answer to the question is /var/ru in
/var/run? is yes. Or to put it another way:

   $a =~ /$home/

is functionally (although not proceedurally) equivalent to:

   $a =~ /^.*$home.*$/

If you want to do a simple test for equality, use 'eq'. If you're
going to test for a pattern and want to match on the entire string,
anchor the patern at the beginning and end of the string:

   $a =~ /^$home$/

but if $home is a simple string without regex metacharaters 'eq' it
going to be a lot faster than m//.

HTH,

-- jay

daggerquill [at] gmail [dot] com
http://www.tuaw.com
http://www.dpguru.com
http://www.engatiki.org

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

one-liner multi-line regex problem

2005-04-25 Thread Kevin Horton

I'm trying to write a perl one-liner that will edit an iCalendar 
format file to remove To Do items.  The file contains several 
thousand lines, and I need to remove several multi-line blocks.  The 
blocks to remove start with a line BEGIN:VTODO (without the quotes) 
and end with a line END:VTODO (also without quotes).

I've tried the following one-liner,
perl -p -i.bak -e 's/BEGIN:VTODO.*END:VTODO//sg' file_name_to_edit
The .bak file is created, which tells me the one-liner is finding my 
file, but the file is identical to the old one - i.e. the regex 
doesn't seem to be matching anything.

I'm also wondering whether my proposed one-liner (if it worked) would 
be too greedy.  Would it pull out everything between the first 
BEGIN:VTODO and the last END:VTODO?

I'd appreciate any hints.
Thanks,
Kevin Horton
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: one-liner multi-line regex problem

2005-04-25 Thread John Doe

Hi Kevin

just hints, no solution :-)

Am Montag, 25. April 2005 12.59 schrieb Kevin Horton:
 I'm trying to write a perl one-liner that will edit an iCalendar
 format file to remove To Do items.  The file contains several
 thousand lines, and I need to remove several multi-line blocks.  The
 blocks to remove start with a line BEGIN:VTODO (without the quotes)
 and end with a line END:VTODO (also without quotes).

 I've tried the following one-liner,

 perl -p -i.bak -e 's/BEGIN:VTODO.*END:VTODO//sg' file_name_to_edit

according to perldoc perlrun, -p reads _one_ line after the other, so you 
can't search for multiline patterns this way.

 The .bak file is created, which tells me the one-liner is finding my
 file, but the file is identical to the old one - i.e. the regex
 doesn't seem to be matching anything.

 I'm also wondering whether my proposed one-liner (if it worked) would
 be too greedy.  

yes or no, depends from the working implementation :-)

 Would it pull out everything between the first 
 BEGIN:VTODO and the last END:VTODO?

yes, if you try to match a string with the whole file in it with the regex 
above.


 I'd appreciate any hints.

 Thanks,

 Kevin Horton

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: one-liner multi-line regex problem

2005-04-25 Thread Dave Gray

 I'm trying to write a perl one-liner that will edit an iCalendar
 format file to remove To Do items.  The file contains several
 thousand lines, and I need to remove several multi-line blocks.  The
 blocks to remove start with a line BEGIN:VTODO (without the quotes)
 and end with a line END:VTODO (also without quotes).
 
 I've tried the following one-liner,
 
 perl -p -i.bak -e 's/BEGIN:VTODO.*END:VTODO//sg' file_name_to_edit

Assuming you have enough disk space:

perl -ane 'print unless /^BEGIN:VTODO/ .. /^END:VTODO/' old  new

perldoc perlrun for more info on perl's command line paramaters

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: one-liner multi-line regex problem

2005-04-25 Thread Kevin Horton

On 25-Apr-05, at 10:06 AM, Jay Savage wrote:
On 4/25/05, Kevin Horton [EMAIL PROTECTED] wrote:
I'm trying to write a perl one-liner that will edit an iCalendar
format file to remove To Do items.  The file contains several
thousand lines, and I need to remove several multi-line blocks.  The
blocks to remove start with a line BEGIN:VTODO (without the quotes)
and end with a line END:VTODO (also without quotes).
I've tried the following one-liner,
perl -p -i.bak -e 's/BEGIN:VTODO.*END:VTODO//sg' file_name_to_edit
The .bak file is created, which tells me the one-liner is finding my
file, but the file is identical to the old one - i.e. the regex
doesn't seem to be matching anything.

-p causes the file to be read one line at a time, which negates the
usefulness of /s.  If you have sufficient RAM to read the entire file
into memory, you can use the -0 option to slurp the file:
   perl -0777 -p -i.bak -e 's/BEGIN:VTODO.*?END:VTODO//sg'
This seems to work perfectly.  I've studied the output for five 
minutes, and can't find a problem.

Thank you very much.
see perldoc perlrun for details
I've learned a lot in the last few minutes, now that I know which of 
the perldoc files to look in.
I'm also wondering whether my proposed one-liner (if it worked) would
be too greedy.  Would it pull out everything between the first
BEGIN:VTODO and the last END:VTODO?
Yes it will.
I looked at trying to use the ? to stop the potential greedyness, but 
I didn't grok how it worked.  Now that I have an example, I think I 
understand it (again, as I thought I understood when I was first 
puzzling through perl on vacation in Christmas 2003).  Hopefully my 
understanding this time is more lasting. :)

Thanks so much to the several people who responded.
Kevin Horton
Ottawa, Canada
RV-8 - Finishing Kit
http://www.kilohotel.com/rv8
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: one-liner multi-line regex problem

2005-04-25 Thread Kevin Horton

On 25-Apr-05, at 10:06 AM, Jay Savage wrote:
On 4/25/05, Kevin Horton [EMAIL PROTECTED] wrote:
I'm trying to write a perl one-liner that will edit an iCalendar
format file to remove To Do items.  The file contains several
thousand lines, and I need to remove several multi-line blocks.  The
blocks to remove start with a line BEGIN:VTODO (without the quotes)
and end with a line END:VTODO (also without quotes).
I've tried the following one-liner,
perl -p -i.bak -e 's/BEGIN:VTODO.*END:VTODO//sg' file_name_to_edit
The .bak file is created, which tells me the one-liner is finding my
file, but the file is identical to the old one - i.e. the regex
doesn't seem to be matching anything.

-p causes the file to be read one line at a time, which negates the
usefulness of /s.  If you have sufficient RAM to read the entire file
into memory, you can use the -0 option to slurp the file:
   perl -0777 -p -i.bak -e 's/BEGIN:VTODO.*?END:VTODO//sg'
This seems to work perfectly.  I've studied the output for five 
minutes, and can't find a problem.

Thank you very much.
see perldoc perlrun for details
I've learned a lot in the last few minutes, now that I know which of 
the perldoc files to look in.
I'm also wondering whether my proposed one-liner (if it worked) would
be too greedy.  Would it pull out everything between the first
BEGIN:VTODO and the last END:VTODO?
Yes it will.
I looked at trying to use the ? to stop the potential greedyness, but 
I didn't grok how it worked.  Now that I have an example, I think I 
understand it (again, as I thought I understood when I was first 
puzzling through perl on vacation in Christmas 2003).  Hopefully my 
understanding this time is more lasting. :)

Thanks so much to the several people who responded.
Kevin Horton
Ottawa, Canada
RV-8 - Finishing Kit
http://www.kilohotel.com/rv8
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: YA Regex problem: lookahead assertion

2005-03-24 Thread Jan Eden

Offer Kaye wrote on 23.03.2005:

Change your RE to: m#h1(.+?)/h1(.+?)(?=h1|$)#gs

In other words, look ahead to either a h1 or the end of the string
($). I have to admit this problem wasn't as simple as I initially
thought - I still have no idea why my first guess didn't work:
m#h1(.+?)/h1(.+?)(?=h1)?#gs

Maybe someone with more knowledge of REs can answer?

John W. Krahn wrote on 23.03.2005:

This should work (untested)

while ($content =~ m#h1(.+?)/h1(.+?)(?=h1|\z)#gs) {


Hi,

and thanks. I tried Offer Kaye's first guess, too, and I think I can explain 
why it does not work.

If you make the lookahead optional, the regex will try to match as few 
characters as possible for the second parentheses - and since the lookahead is 
optional, this will be only a single character.

You have to force a positive lookahead assertion to make sure $2 receives 
everything up to either the next h1 or the end of the string.

So the other suggestion works. Thank you! The reason I had not tried that was 
the wrong assumption that alternations in lookahead/lookbehind assertions had 
to be of the same length, like in (?=abc|def), but not (?=abc|defg). But now I 
remember that the whole lookahead/lookbehind has to be of a fixed length, so 
you cannot use quantifiers.

Thanks again,

Jan
-- 
A common mistake that people make when trying to design something completely 
foolproof is to underestimate the ingenuity of complete fools.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: YA Regex problem: lookahead assertion

2005-03-24 Thread John W. Krahn

Jan Eden wrote:
John W. Krahn wrote on 23.03.2005:
This should work (untested)
while ($content =~ m#h1(.+?)/h1(.+?)(?=h1|\z)#gs) {
and thanks. I tried Offer Kaye's first guess, too, and I think I
can explain why it does not work.
If you make the lookahead optional, the regex will try to match as
few characters as possible for the second parentheses - and since
the lookahead is optional, this will be only a single character.
You have to force a positive lookahead assertion to make sure $2
receives everything up to either the next h1 or the end of the
string.
So the other suggestion works. Thank you! The reason I had not tried
that was the wrong assumption that alternations in
lookahead/lookbehind assertions had to be of the same length, like
in (?=abc|def), but not (?=abc|defg). But now I remember that the
whole lookahead/lookbehind has to be of a fixed length, so you cannot
use quantifiers.
lookahead CAN use quantifiers but lookbehind CANNOT.
John
--
use Perl;
program
fulfillment
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

YA Regex problem: lookahead assertion

2005-03-23 Thread Jan Eden

Hi,

I use the following regex to split a (really simple) file into sections headed 
by h1.+?/h1:

while ($content =~ m#h1(.+?)/h1(.+?)(?=h1)#gs) {
...
}

This works perfectly, but obviously does not catch the last section, as it is 
not followed by h1.

How can I catch the last section without

* doing a separate match for it
* loosing the convenience of the g switch to wade through the whole file?

Thanks,

Jan
-- 
I'd never join any club that would have the likes of me as a member. - Groucho 
Marx

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: YA Regex problem: lookahead assertion

2005-03-23 Thread Offer Kaye

On Wed, 23 Mar 2005 17:06:59 +0100, Jan Eden wrote:
 Hi,
 
 I use the following regex to split a (really simple) file into sections 
 headed by h1.+?/h1:
 
 while ($content =~ m#h1(.+?)/h1(.+?)(?=h1)#gs) {
 ...
 }
 
 This works perfectly, but obviously does not catch the last section, as it is 
 not followed 
 by h1.
 
 How can I catch the last section without
 
 * doing a separate match for it
 * loosing the convenience of the g switch to wade through the whole file?
 
 Thanks,
 
 Jan

Change your RE to:
m#h1(.+?)/h1(.+?)(?=h1|$)#gs

In other words, look ahead to either a h1 or the end of the string ($).
I have to admit this problem wasn't as simple as I initially thought -
I still have no idea why my first guess didn't work:
m#h1(.+?)/h1(.+?)(?=h1)?#gs

Maybe someone with more knowledge of REs can answer?

Regards,
-- 
Offer Kaye

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

RE: YA Regex problem: lookahead assertion

2005-03-23 Thread Charles K. Clarkson

Jan Eden mailto:[EMAIL PROTECTED] wrote:
: Hi,
: 
: I use the following regex to split a (really simple) file into
: sections headed by h1.+?/h1:
: 
: while ($content =~ m#h1(.+?)/h1(.+?)(?=h1)#gs) {
: ...
: }

 The answer may be in your description. Use 'split'. When you
use a capture inside the regular expression in 'split', the
capture is returned. @content is 'shift'ed to rid the first empty
element (or filled if there is something before the first h1)
returned by split.


#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper 'Dumper';

my $content = do{ local $/ = undef; DATA; };

my @content = split m|h1(.+?)/h1|, $content;
shift @content;

print Dumper [EMAIL PROTECTED];

__END__
h1heading 1/h1
Some stuff
h1heading 2/h1
Some stuff
h1heading 3/h1
Some stuff
h1heading 4/h1
Some stuff


HTH,

Charles K. Clarkson
-- 
Mobile Homes Specialist
254 968-8328


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: YA Regex problem: lookahead assertion

2005-03-23 Thread John W. Krahn

Jan Eden wrote:
Hi,
Hello,
I use the following regex to split a (really simple) file into sections headed by 
h1.+?/h1:
while ($content =~ m#h1(.+?)/h1(.+?)(?=h1)#gs) {
...
}
This works perfectly, but obviously does not catch the last section, as it is not 
followed by h1.
How can I catch the last section without
* doing a separate match for it
* loosing the convenience of the g switch to wade through the whole file?
This should work (untested)
while ($content =~ m#h1(.+?)/h1(.+?)(?=h1|\z)#gs) {
John
--
use Perl;
program
fulfillment
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: A regex problem.

2004-11-09 Thread William C. Bruce

[EMAIL PROTECTED] (Denham Eva) wrote in 
news:[EMAIL PROTECTED]:

 Hello Gurus,
 In a script I have a piece of code as such:-
 * snip**
 my $filedate =~ s/(\d+)//g;
 * snip end ***
 The data I am parsing looks as such :-
 ** DATA 
 
 C:/directory/MSISExport_20040814.csv
 
 C:/directory/MSISExport_20040813.csv
 
 .
 
 C:/directory/MSISExport_20030501.csv
 
 ** DATA end *
 Now I am actually trying to dump everything except the date or numerals
 as such :- 20040814
 
 Can someone help me with that regex? I am having a frustrating time of
 it!
 
 Much appreciated
 
 Denham

Denham,

If you have the filename in a scalar called $filename then the code to
place the date into a scalar called $filedate would be:

($filedate) = $filename =~ m|MSISExport_([0-9]+)\.csv|;

This places the captured value in ([0-9]+) into $filedate.  This doesn't 
catch an instance where the filename is malformed and that date doesn't 
exist there.

Hope this helps,

Bill

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

A regex problem.

2004-09-06 Thread Denham Eva

Hello Gurus,

 

In a script I have a piece of code as such:-

 

* snip**

 

my $filedate =~ s/(\d+)//g;

 

* snip end ***

 

The data I am parsing looks as such :-

 

** DATA 

C:/directory/MSISExport_20040814.csv

C:/directory/MSISExport_20040813.csv

.

.

.

.

C:/directory/MSISExport_20030501.csv

** DATA end *

 

Now I am actually trying to dump everything except the date or numerals
as such :- 20040814

Can someone help me with that regex? I am having a frustrating time of
it!

 

Much appreciated

Denham

RE: A regex problem.

2004-09-06 Thread Jaffer Shaik

Hi,

Try in this way. Just remove my, you will get it.

$filedate = C:/directory/MSISExport_20040814.csv;
($filedate) =~ s/(\_\d+)//g;
print $filedate\n;

Thank you
jaffer

-Original Message-
From: Denham Eva [mailto:[EMAIL PROTECTED]
Sent: Monday, September 06, 2004 6:11 PM
To: [EMAIL PROTECTED]
Subject: A regex problem.


Hello Gurus,

 

In a script I have a piece of code as such:-

 

* snip**

 

my $filedate =~ s/(\d+)//g;

 

* snip end ***

 

The data I am parsing looks as such :-

 

** DATA 

C:/directory/MSISExport_20040814.csv

C:/directory/MSISExport_20040813.csv


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: A regex problem.

2004-09-06 Thread Flemming Greve Skovengaard

Denham Eva wrote:
Hello Gurus,
 

In a script I have a piece of code as such:-
 

* snip**
 

my $filedate =~ s/(\d+)//g;
Try this instead:
my $filedate;
if( $var_with_file_name =~ m/(\d+)\.csv$/ ) {
$filedate = $1;
}
print $filename\n;
 

* snip end ***
 

The data I am parsing looks as such :-
 

** DATA 
C:/directory/MSISExport_20040814.csv
C:/directory/MSISExport_20040813.csv
.
.
.
.
C:/directory/MSISExport_20030501.csv
** DATA end *
 

Now I am actually trying to dump everything except the date or numerals
as such :- 20040814
Can someone help me with that regex? I am having a frustrating time of
it!
 

Much appreciated
Denham


--
Flemming Greve Skovengaard Man still has one belief,
a.k.a Greven, TuxPower One decree that stands alone
[EMAIL PROTECTED]The laying down of arms
4112.38 BogoMIPS   Is like cancer to their bones
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: A regex problem.

2004-09-06 Thread Gunnar Hjalmarsson

Jaffer Shaik wrote:
Try in this way. Just remove my, you will get it.
What kind of stupid advice is that?
$filedate = C:/directory/MSISExport_20040814.csv;
($filedate) =~ s/(\_\d+)//g;
Left aside that the parentheses are redundant, that does the opposite 
of what the OP asked for.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: A regex problem.

2004-09-06 Thread Ing. Branislav Gerzo

Denham Eva [DE], on Monday, September 6, 2004 at 14:41 (+0200) typed:

DE my $filedate =~ s/(\d+)//g;
DE ** DATA 
DE C:/directory/MSISExport_20040814.csv
DE C:/directory/MSISExport_20040813.csv
DE Can someone help me with that regex? I am having a frustrating time of

I hope this help you:

use strict;
for (DATA) {
   print $1\n if /MSISExport_(\d+)\.csv$/gi;
}

__DATA__
C:/directory/MSISExport_20040814.csv
C:/directory/MSISExport_20040816.csv
C:/directory/MSISExport_20040817.csv
C:/directory/MSISExport_20040824.csv


-- 

 ...m8s, cu l8r, Brano.

[Paragraph. Paragraph. Paragraph. Paragraph. Paragraph. -David Moser]



-=x=-
Skontrolované antivírovým programom NOD32


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

RE: regex problem

2004-08-10 Thread DBSMITH

cool thanks I guess I am a wanna be programmer but do UNIX in real 
life.
So Data::Dumper shows me a structure of any scaler?  Could you show me an 
example?

thank you, 

Derek B. Smith
OhioHealth IT
UNIX / TSM / EDM Teams






Charles K. Clarkson [EMAIL PROTECTED]
08/09/2004 06:31 PM

 
To: [EMAIL PROTECTED]
cc: [EMAIL PROTECTED]
Subject:RE: regex problem


[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:

: it is a system app call that populates the
: $EDM_nonactive_tapelist I am not sure what you mean
: I'm not sure. has the Orig strings in it is not a
: precise statement for a computer programmer.


I meant that has the Orig strings in it does not
tell us how the strings are represented. It does not
precisely define how the data is structured.


That statement does not accurately describe the
data. Here are two examples of strings listed in a
scalar. In both cases I could describe each of these
examples as a scalar variable with strings in it.

$baz = [
[ 'foo' ],
[ 'bar' ],
];

$baz = foo\nbar\n;

As computer programmers, we have to describe
data precisely. If you are uncertain how to describe
a structure try printing it with DATA::Dumper.


: the foreach with the split did work!

Great. I'm glad I could help.


HTH,

Charles K. Clarkson
-- 
Mobile Homes Specialist
254 968-8328

RE: regex problem

2004-08-10 Thread Chris Devers

On Tue, 10 Aug 2004 [EMAIL PROTECTED] wrote:
So Data::Dumper shows me a structure of any scaler?  Could you show me 
an example?
Data::Dumper is a tool for showing the structure of *any* data.
As is often the case, the perldoc has some of the best documentation:
perldoc Data::Dumper
It starts out with this:
NAME
   Data::Dumper - stringified perl data structures, suitable
   for both printing and eval
SYNOPSIS
   use Data::Dumper;
   # simple procedural interface
   print Dumper($foo, $bar);
   # extended usage with names
   print Data::Dumper-Dump([$foo, $bar], [qw(foo *ary)]);
   # configuration variables
   {
 local $Data::Dumper::Purity = 1;
 eval Data::Dumper-Dump([$foo, $bar], [qw(foo *ary)]);
   }
   # OO usage
   $d = Data::Dumper-new([$foo, $bar], [qw(foo *ary)]);
  ...
   print $d-Dump;
  ...
   $d-Purity(1)-Terse(1)-Deepcopy(1);
   eval $d-Dump;
And goes on to describe usage details  more examples.
Good luck with it!

--
Chris Devers
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

regex problem

2004-08-09 Thread DBSMITH

All I am getting the error from my if statement:

^* matches null string many times in regex; marked by -- HERE in m/^* -- 
HERE Orig/ at .

I am trying to get everything except *Orig in this output :

*Orig Vol: 1703FBBDED58D4AD (E00117), Seq #: 000114 in TLU: st_9840_acs_0, 
media: STK 984e
 Orig Vol: 0303E68522777483 (E00486), Seq #: 000800 in TLU: st_9840_acs_0, 
media: STK 984e

07/12/2004 18:13:17 Rotation ID:4A03CC27.A30DEE72.0200.0E0B8707, 5 
backups
 Media duplication is not enabled.

*Orig Vol: 4A03CC27A30DEE72 (E00632), Seq #: 000273 in TLU: st_9840_acs_0, 
media: STK 984e

Here is my code:
 
foreach ($EDM_nonactive_tapelist) {
if ($EDM_nonactive_tapelist !~ \^\*Orig) {
print $_;
}
}

*NOTE the variable $EDM_nonactive_tapelist has the Orig strings in it.
Does foreach read line by line?
Do I even need the foreach statement?

thank you!

Derek B. Smith
OhioHealth IT
UNIX / TSM / EDM Teams

Re: regex problem

2004-08-09 Thread Gunnar Hjalmarsson

[EMAIL PROTECTED] wrote:
All I am getting the error from my if statement:
^* matches null string many times in regex; marked by
-- HERE in m/^* -- HERE Orig/ at .
I am trying to get everything except *Orig in this output :
samlpe data snipped
Here is my code:
 
foreach ($EDM_nonactive_tapelist) {
if ($EDM_nonactive_tapelist !~ \^\*Orig) {
print $_;
}
}
- The ^ character shall not be escaped when marking the beginning of a 
string.

- You need to tell Perl that you want to use the m// operator, either like
m^\*Orig
or by using straight slashes:
/^\*Orig/
But why use a regex at all?
print unless substr($_, 0, 5) eq '*Orig';
*NOTE the variable $EDM_nonactive_tapelist has the Orig strings in it.
Does foreach read line by line?
Not unless you tell Perl so:
foreach ( split /\n/, $EDM_nonactive_tapelist ) {
print $_\n unless substr($_, 0, 5) eq '*Orig';
}
Do I even need the foreach statement?
No.
print map $_\n, grep { substr($_, 0, 5) ne '*Orig' }
  $EDM_nonactive_tapelist =~ /(.+)/mg;
;-)
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: regex problem

2004-08-09 Thread DBSMITH

I am still getting the same error with your suggestion.  Does foreach read 
line by line?  Do I need the foreach? 

Derek B. Smith
OhioHealth IT
UNIX / TSM / EDM Teams
614-566-4145





Felix Li [EMAIL PROTECTED]
08/09/2004 03:56 PM

 
To: [EMAIL PROTECTED]
cc: 
Subject:Re: regex problem


perhaps you meant ^\* ... rather than \^\* ...

the later will trap things beginning with ^* ...

- Original Message - 
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, August 09, 2004 3:54 PM
Subject: regex problem


 All I am getting the error from my if statement:

 ^* matches null string many times in regex; marked by -- HERE in m/^* 
-- 
 HERE Orig/ at .

 I am trying to get everything except *Orig in this output :

 *Orig Vol: 1703FBBDED58D4AD (E00117), Seq #: 000114 in TLU: 
st_9840_acs_0,
 media: STK 984e
  Orig Vol: 0303E68522777483 (E00486), Seq #: 000800 in TLU: 
st_9840_acs_0,
 media: STK 984e

 07/12/2004 18:13:17 Rotation ID:4A03CC27.A30DEE72.0200.0E0B8707, 5
 backups
  Media duplication is not enabled.

 *Orig Vol: 4A03CC27A30DEE72 (E00632), Seq #: 000273 in TLU: 
st_9840_acs_0,
 media: STK 984e

 Here is my code:

 foreach ($EDM_nonactive_tapelist) {
 if ($EDM_nonactive_tapelist !~ \^\*Orig) {
 print $_;
 }
 }

 *NOTE the variable $EDM_nonactive_tapelist has the Orig strings in it.
 Does foreach read line by line?
 Do I even need the foreach statement?

 thank you!

 Derek B. Smith
 OhioHealth IT
 UNIX / TSM / EDM Teams

RE: regex problem

2004-08-09 Thread Charles K. Clarkson

[EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

: All I am getting the error from my if statement:
: 
: ^* matches null string many times in regex; marked by --
: HERE in m/^* --
: HERE Orig/ at .
: 
: I am trying to get everything except *Orig in this output :
: 
: *Orig Vol: 1703FBBDED58D4AD (E00117), Seq #: 000114 in TLU:
: st_9840_acs_0, media: STK 984e
:  Orig Vol: 0303E68522777483 (E00486), Seq #: 000800 in TLU:
: st_9840_acs_0, media: STK 984e
: 
: 07/12/2004 18:13:17 Rotation
: ID:4A03CC27.A30DEE72.0200.0E0B8707, 5
: backups
:  Media duplication is not enabled.
: 
: *Orig Vol: 4A03CC27A30DEE72 (E00632), Seq #: 000273 in TLU:
: st_9840_acs_0, media: STK 984e
: 
: Here is my code:
: 
: foreach ($EDM_nonactive_tapelist) {
: if ($EDM_nonactive_tapelist !~ \^\*Orig) {
: print $_;
: }
: }
: 
: *NOTE the variable $EDM_nonactive_tapelist has the Orig strings
: in it. Does foreach read line by line?

No. 'foreach' as used above aliases $_ to each element
of a list of scalars one item at a time. The function does
not know the concept of line.

You have provided a list of one scalar -
$EDM_nonactive_tapelist. The loop will process
$EDM_nonactive_tapelist once and place it's value in $_.
Any changes to $_ will also change $EDM_nonactive_tapelist.

Assuming $EDM_nonactive_tapelist has a list of strings
separated by newlines (\n), a list of those strings
might be expressed as this.

foreach my $srting ( split /\n/, $EDM_nonactive_tapelist ) {
print $srting\n if /^\*Orig/;
}

   In this example we have taken each string and placed it
in a scalar variable named $string. $string is tested and
printed if that test is true. The 'split' splits each
string at the newline and discard that character.


: Do I even need the foreach statement?

I'm not sure. has the Orig strings in it is not a
precise statement for a computer programmer.


Question: How did this list of strings get into a
  single scalar?



HTH,

Charles K. Clarkson
-- 
Mobile Homes Specialist
254 968-8328


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

RE: regex problem

2004-08-09 Thread DBSMITH

it is a system app call that  populates the $EDM_nonactive_tapelist
I am not sure what you meanI'm not sure. has the Orig strings in it is not a
precise statement for a computer programmer.


the variable $EDM_nonactive_tapelist which is a file with the Orig strings in it !

the foreach with the split did work!

thanks!

Derek B. Smith
OhioHealth IT
UNIX / TSM / EDM Teams






Charles K. Clarkson [EMAIL PROTECTED]
08/09/2004 05:41 PM

 
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
cc: 
Subject:RE: regex problem


[EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

: All I am getting the error from my if statement:
: 
: ^* matches null string many times in regex; marked by --
: HERE in m/^* --
: HERE Orig/ at .
: 
: I am trying to get everything except *Orig in this output :
: 
: *Orig Vol: 1703FBBDED58D4AD (E00117), Seq #: 000114 in TLU:
: st_9840_acs_0, media: STK 984e
:  Orig Vol: 0303E68522777483 (E00486), Seq #: 000800 in TLU:
: st_9840_acs_0, media: STK 984e
: 
: 07/12/2004 18:13:17 Rotation
: ID:4A03CC27.A30DEE72.0200.0E0B8707, 5
: backups
:  Media duplication is not enabled.
: 
: *Orig Vol: 4A03CC27A30DEE72 (E00632), Seq #: 000273 in TLU:
: st_9840_acs_0, media: STK 984e
: 
: Here is my code:
: 
: foreach ($EDM_nonactive_tapelist) {
: if ($EDM_nonactive_tapelist !~ \^\*Orig) {
: print $_;
: }
: }
: 
: *NOTE the variable $EDM_nonactive_tapelist has the Orig strings
: in it. Does foreach read line by line?

No. 'foreach' as used above aliases $_ to each element
of a list of scalars one item at a time. The function does
not know the concept of line.

You have provided a list of one scalar -
$EDM_nonactive_tapelist. The loop will process
$EDM_nonactive_tapelist once and place it's value in $_.
Any changes to $_ will also change $EDM_nonactive_tapelist.

Assuming $EDM_nonactive_tapelist has a list of strings
separated by newlines (\n), a list of those strings
might be expressed as this.

foreach my $srting ( split /\n/, $EDM_nonactive_tapelist ) {
print $srting\n if /^\*Orig/;
}

   In this example we have taken each string and placed it
in a scalar variable named $string. $string is tested and
printed if that test is true. The 'split' splits each
string at the newline and discard that character.


: Do I even need the foreach statement?

I'm not sure. has the Orig strings in it is not a
precise statement for a computer programmer.


Question: How did this list of strings get into a
  single scalar?



HTH,

Charles K. Clarkson
-- 
Mobile Homes Specialist
254 968-8328

RE: regex problem

2004-08-09 Thread Charles K. Clarkson

[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:

: it is a system app call that populates the
: $EDM_nonactive_tapelist I am not sure what you mean
: I'm not sure. has the Orig strings in it is not a
: precise statement for a computer programmer.


I meant that has the Orig strings in it does not
tell us how the strings are represented. It does not
precisely define how the data is structured.


That statement does not accurately describe the
data. Here are two examples of strings listed in a
scalar. In both cases I could describe each of these
examples as a scalar variable with strings in it.

$baz = [
[ 'foo' ],
[ 'bar' ],
];

$baz = foo\nbar\n;

As computer programmers, we have to describe
data precisely. If you are uncertain how to describe
a structure try printing it with DATA::Dumper.


: the foreach with the split did work!

Great. I'm glad I could help.


HTH,

Charles K. Clarkson
-- 
Mobile Homes Specialist
254 968-8328


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Yet Another Regex Problem

2004-06-08 Thread Francesco del Vecchio

Hi guyz,

this regex are goin' to drive me crazy!

My problem is:

I have to find URLs in a text file (so, cannot use LWP or HTML parser)

I've tried with something like

/(http.:\/\/.*\s)/

willing to find anything starting with http/https with //: and catching everything 
up to a space
or newline.

It works in some cases but it catch the widest possible matching, so if I have 
something like

try to click here http://www.yahoo.com or there http://www.google.com;

the result for $1 is:
http://www.yahoo.com or there http://www.google.com;

How can I get simply http://www.yahoo.com; and then http://www.google.com;?

thanx very much...you'r saving a man
Francesco




__
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: Yet Another Regex Problem

2004-06-08 Thread Jeff 'japhy' Pinyan

On Jun 8, Francesco del Vecchio said:

I have to find URLs in a text file (so, cannot use LWP or HTML parser)

I'm curious why you can't use a module to extract URLs, but I'll continue
anyway.

/(http.:\/\/.*\s)/

That regex is broken in a few ways.  First, it does NOT match 'http:', it
only matches 'http_:', where there is some character between the p and the
colon.  Second, the .* in it is greedy (it matches as much as it can).
Third, it requires your URL to be followed by a space, which won't always
be the case.

try to click here http://www.yahoo.com or there http://www.google.com;

I would suggest trying:

  @urls = $string =~ m{(https?://\S+)}g;

Using \S+ makes it match one or more non-whitespace characters.  The only
problem with this is that if there happens to be punctuation after the
URL, it'll get included.  An example is this:

  Go to http://www.yahoo.com, and you'll see what I mean.

That will match `http://www.yahoo.com,' (including the comma).

-- 
Jeff japhy Pinyan  [EMAIL PROTECTED]  http://www.pobox.com/~japhy/
RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
CPAN ID: PINYAN[Need a programmer?  If you like my work, let me know.]
stu what does y/// stand for?  tenderpuss why, yansliterate of course.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: Yet Another Regex Problem

2004-06-08 Thread Ramprasad A Padmanabhan

CHange your regex to /http(s)*:\/\/.*?\s/

To see the docs 
perldoc perlre ... look for greedy

HTH
Ram



On Tue, 2004-06-08 at 16:15, Francesco del Vecchio wrote:
 Hi guyz,
 
 this regex are goin' to drive me crazy!
 
 My problem is:
 
 I have to find URLs in a text file (so, cannot use LWP or HTML parser)
 
 I've tried with something like
 
 /(http.:\/\/.*\s)/
 
 willing to find anything starting with http/https with //: and catching everything 
 up to a space
 or newline.
 
 It works in some cases but it catch the widest possible matching, so if I have 
 something like
 
 try to click here http://www.yahoo.com or there http://www.google.com;
 
 the result for $1 is:
 http://www.yahoo.com or there http://www.google.com;
 
 How can I get simply http://www.yahoo.com; and then http://www.google.com;?
 
 thanx very much...you'r saving a man
 Francesco
 
 
   
   
 __
 Do you Yahoo!?
 Friends.  Fun.  Try the all-new Yahoo! Messenger.
 http://messenger.yahoo.com/



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: Substitution/Regex problem

2004-05-02 Thread Cedric Godin

On Thursday 29 April 2004 10:31, Owen wrote:
 I would like to replace all instances of

  @non_space_characters[non_space_characters] with
  $non_space_characters[non_space_characters]

 The program below gets the first one only. How do I get the others?

 TIA

 Owen
 ---
 #!/usr/bin/perl -w
 use strict;

 my $line;
 while (DATA){
 $line=$_;
 #$line=~s/(@)(\S+)(\[\S+\])/\$$2$3/g;
 $line=~s/(@)(\S+\[\S+\])/\$$2/g;

 print $line\n;

 }
 __DATA__
 @[EMAIL PROTECTED]@banana[4];

don't be greedy ;-)

s/\@(\S+?\[\S+?\])/\$$1/g;

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Re: Substitution/Regex problem

2004-04-29 Thread John W. Krahn

Owen wrote:
 
 I would like to replace all instances of
 
  @non_space_characters[non_space_characters] with
  $non_space_characters[non_space_characters]
 
 The program below gets the first one only. How do I get the others?
 
 ---
 #!/usr/bin/perl -w
 use strict;
 
 my $line;
 while (DATA){
 $line=$_;

Why not just:

while ( my $line = DATA ) {


 #$line=~s/(@)(\S+)(\[\S+\])/\$$2$3/g;
 $line=~s/(@)(\S+\[\S+\])/\$$2/g;

+, * and ? are greedy so they will match the longest string that they
can.  Your complete line up to the newline will be matched by \S+ so you
want to be more selective in what you will match.  Since user defined
variables must consist alpha-numeric and the _ characters you can use
\w+ instead.  Also, why are you capturing the @ into $1?

$line =~ s/@(\w+\[[^]]+])/\$$1/g;


 print $line\n;
 
 }
 __DATA__
 @[EMAIL PROTECTED]@banana[4];


John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response

Regex problem

2003-09-24 Thread Segree, Gareth

I have a directory of files that I want to move to another directory. 
(eg. ALLY20030111W.eps
 TEST W20030122
 HELP WANTED20030901WW.eps
 GIRL WATCH BIRD 20030101
etc..)

I want to be able to parse the filename and replace the date portion 
with any date (eg $1=ALLY $2=20030111 $3=W $4=.eps)
Then I want to make $2=20030925 and if $3 is empty then I assign .eps
to $3 or if $4 is empty then assign .eps


How do I do this?

#!/usr/bin/perl
# move_file.plx
use warnings;
use strict;

$source = /path/to/source/;
$destination = /path/to/destination/;
$query = ([A-Za-z]+)(\s*?)([0-9]*)(\s*?)([A-Za-z]*)([eps])
opendir DH, $source or die Couldn't open the current directory: 
$source; while ($_ = readdir(DH)) {
   next if $_ eq . or $_ eq ..;

   if (/$query/) {
  print Copying $_ ...\n;
  rename $source$_, $destination$_;
  print file copied successfully.\n;
   }
}
 
What's wrong with my code. Am I overlooking something?

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

REGEX PROBLEM

2003-07-25 Thread magelord

hi, i have the follwing strings:


/tmp/test/.test.txt
/tmp/test/hallo.txt
/tmp/test/xyz/abc.txt
/var/log/ksy/123.log


now i need a regex that matches all lines but the one that contains a
filename starting with a point. like .test.txt. how can i do that?

this is what i have:

'\.(?!tgz)[^.]*$' this matches everything, but tgz at the end of a
line, so

'(?!\.)[^.]*$' should do the job, but it doesnt:( 


THANK YOU:)

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: matching file names starting with a dot (was: REGEX PROBLEM)

2003-07-25 Thread Janek Schleicher

magelor wrote at Fri, 25 Jul 2003 11:09:03 +0200:

 /tmp/test/.test.txt
 /tmp/test/hallo.txt
 /tmp/test/xyz/abc.txt
 /var/log/ksy/123.log
 
 
 now i need a regex that matches all lines but the one that contains a
 filename starting with a point. like .test.txt. how can i do that?
 
 this is what i have:
 
 '\.(?!tgz)[^.]*$' this matches everything, but tgz at the end of a
 line, so
 
 '(?!\.)[^.]*$' should do the job, but it doesnt:( 

If you only want to guarantuee that the base filename doesn't start with a
dot, you might try something like

m!/(?!\.)\w+\.\w+$!
# or
m!/[^.]+\.\w+$!
# or
m!/[^/.]+$!

The first both checks wether there is a *.* file (with no leading \.) after
the last slash.
The second checks whether the string ends on a sequence of no slashes and
no dots what also does what you might want.

However, in general I would propose to use a module to gain an easy
understandable and robust solution:


use File::Basename;  # available in CPAN

sub is_file_starting_with_dot {
return basename($_[0]) =~ /^\./;
}

foreach (/tmp/test/.test.txt,
 /tmp/test/hallo.txt,
 /tmp/test/xyz/abc.txt,
 /var/log/ksy/123.log,
)
{
print $_, is_file_starting_with_dot($_) ?  starts with dot :  :-) ;
print \n;
}


Best Wishes,
Janek

PS: It's better not to shout to the reader with an uppercase subject that
isn't very detailed. I would have ignored you if it wouldn't be friday :-)



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: REGEX PROBLEM

2003-07-25 Thread Kino

On Friday, Jul 25, 2003, at 18:09 Asia/Tokyo, [EMAIL PROTECTED] 
wrote:

/tmp/test/.test.txt
/tmp/test/hallo.txt
/tmp/test/xyz/abc.txt
/var/log/ksy/123.log
now i need a regex that matches all lines but the one that contains a
filename starting with a point. like .test.txt. how can i do that?
this is what i have:

'\.(?!tgz)[^.]*$' this matches everything, but tgz at the end of a
line, so
'(?!\.)[^.]*$' should do the job, but it doesnt:(
because your expression matches, for example, just 'txt' too. Try

	/\/[^.\/][^\/]+$/g;

or

	/^.+\/[^.\/][^\/]+$/g;



Kino

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: REGEX PROBLEM

2003-07-25 Thread Rob Dixon


[EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 hi, i have the follwing strings:


 /tmp/test/.test.txt
 /tmp/test/hallo.txt
 /tmp/test/xyz/abc.txt
 /var/log/ksy/123.log


 now i need a regex that matches all lines but the one that contains a
 filename starting with a point. like .test.txt. how can i do that?

 this is what i have:

 '\.(?!tgz)[^.]*$' this matches everything, but tgz at the end of a
 line, so

 '(?!\.)[^.]*$' should do the job, but it doesnt:(

The following will help. It first generatest the file's basename in $1 by
capturing the string of all trailing characters which aren't '/', and then
checks to ensure
that that basename doesn't start with a dot..

HTH,

Rob


  while (DATA) {
chomp;
if ( m([^/]*)$ and $1 =~ /^[^.]/ ) {
  print $_, \n;
}
  }
  __DATA__
  /tmp/test/.test.txt
  /tmp/test/hallo.txt
  /tmp/test/xyz/abc.txt
  /var/log/ksy/123.log

OUTPUT

  /tmp/test/hallo.txt
  /tmp/test/xyz/abc.txt
  /var/log/ksy/123.log



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

1 2 >

1 - 100 of 197 matches

Mail list logo