Re: Help me with a regex problem
You might consider using Regexp::Common::net. It provides a convenient set of functions for matching IP v4, v6 and mac addresses. https://metacpan.org/pod/Regexp::Common::net On Fri, 25 Oct 2019 at 19:43, John W. Krahn wrote: > On 2019-10-25 3:23 a.m., Maggie Q Roth wrote: > > Hello > > Hello. > > > There are two primary types of lines in the log: > > What are those two types? How do you define them? > > > > 60.191.38.xx/ > > 42.120.161.xx /archives/1005 > > From my point of view those two lines have two fields, the first looks > like an IP address and the second looks like a file path. In other > words I can't distinguish the difference between these two "types". > > > > I know how to write regex to match each line, but don't get the good > result > > with one regex to match both lines. > > > > Can you help? > > Perhaps if you could describe the problem better? > > > John > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > >
Re: Help me with a regex problem
On 2019-10-25 3:23 a.m., Maggie Q Roth wrote: Hello Hello. There are two primary types of lines in the log: What are those two types? How do you define them? 60.191.38.xx/ 42.120.161.xx /archives/1005 From my point of view those two lines have two fields, the first looks like an IP address and the second looks like a file path. In other words I can't distinguish the difference between these two "types". I know how to write regex to match each line, but don't get the good result with one regex to match both lines. Can you help? Perhaps if you could describe the problem better? John -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Help me with a regex problem
/(?[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s+(?\/.*)/ To avoid the "leaning toothpick" problem, Perl lets use different match delimiters, so the above is the same as: m#(?[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s+(?/.*)# I assume you want to capture the IP and the path, right? if ( $entry =~ m#([\d.]+)\s+(/\S+)# ) { my ($ip, $path) = ($1, $2); print "IP $ip asked for path $path\n"; On Fri, Oct 25, 2019 at 5:28 AM Илья Рассадин wrote: > For example, this regex > > /(?[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s+(?\/.*)/ > > On 25.10.2019 13:23, Maggie Q Roth wrote: > > Hello > > > > There are two primary types of lines in the log: > > > > 60.191.38.xx/ > > 42.120.161.xx /archives/1005 > > > > I know how to write regex to match each line, but don't get the good > > result with one regex to match both lines. > > > > Can you help? > > > > Thanks, > > Maggie > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > > -- a Andy Bach, afb...@gmail.com 608 658-1890 cell 608 261-5738 wk
Re: Help me with a regex problem
That is a backslash followed by a forward slash. The backslash tells the regex parser to treat the next character as a literal character. Useful for matching periods, question marks, brackets, etc. A period matches any character once and an asterisk matches the previous character any number of times. .* basically means match everything. Apologies if this is formatted incorrectly. Sending from my phone. On Fri, Oct 25, 2019 at 06:37 Maggie Q Roth wrote: > what's V.*? > > Maggie > > On Fri, Oct 25, 2019 at 6:28 PM Илья Рассадин wrote: > >> For example, this regex >> >> /(?[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s+(?\/.*)/ >> >> On 25.10.2019 13:23, Maggie Q Roth wrote: >> > Hello >> > >> > There are two primary types of lines in the log: >> > >> > 60.191.38.xx/ >> > 42.120.161.xx /archives/1005 >> > >> > I know how to write regex to match each line, but don't get the good >> > result with one regex to match both lines. >> > >> > Can you help? >> > >> > Thanks, >> > Maggie >> >> -- >> To unsubscribe, e-mail: beginners-unsubscr...@perl.org >> For additional commands, e-mail: beginners-h...@perl.org >> http://learn.perl.org/ >> >> >> -- Benjamin Pendygraft
Re: Help me with a regex problem
my $n = '[0-9]{1,3}'; if ( =~ ( m[ (?:$n\.){3} $n \s+ \S+ ]x ) { # match } On Fri, Oct 25, 2019 at 3:37 AM Maggie Q Roth wrote: > what's V.*? > > Maggie > > On Fri, Oct 25, 2019 at 6:28 PM Илья Рассадин wrote: > >> For example, this regex >> >> /(?[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s+(?\/.*)/ >> >> On 25.10.2019 13:23, Maggie Q Roth wrote: >> > Hello >> > >> > There are two primary types of lines in the log: >> > >> > 60.191.38.xx/ >> > 42.120.161.xx /archives/1005 >> > >> > I know how to write regex to match each line, but don't get the good >> > result with one regex to match both lines. >> > >> > Can you help? >> > >> > Thanks, >> > Maggie >> >> -- >> To unsubscribe, e-mail: beginners-unsubscr...@perl.org >> For additional commands, e-mail: beginners-h...@perl.org >> http://learn.perl.org/ >> >> >>
Re: Help me with a regex problem
what's V.*? Maggie On Fri, Oct 25, 2019 at 6:28 PM Илья Рассадин wrote: > For example, this regex > > /(?[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s+(?\/.*)/ > > On 25.10.2019 13:23, Maggie Q Roth wrote: > > Hello > > > > There are two primary types of lines in the log: > > > > 60.191.38.xx/ > > 42.120.161.xx /archives/1005 > > > > I know how to write regex to match each line, but don't get the good > > result with one regex to match both lines. > > > > Can you help? > > > > Thanks, > > Maggie > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > >
Re: Help me with a regex problem
For example, this regex /(?[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\s+(?\/.*)/ On 25.10.2019 13:23, Maggie Q Roth wrote: Hello There are two primary types of lines in the log: 60.191.38.xx / 42.120.161.xx /archives/1005 I know how to write regex to match each line, but don't get the good result with one regex to match both lines. Can you help? Thanks, Maggie -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Help me with a regex problem
Hello There are two primary types of lines in the log: 60.191.38.xx/ 42.120.161.xx /archives/1005 I know how to write regex to match each line, but don't get the good result with one regex to match both lines. Can you help? Thanks, Maggie
regex problem?
The following code apparently is not doing what I wanted. My intention was to confirm that the general format of $student_id was this: several uppercase letters followed by a hyphen followed by several digits. If not, it would trigger the die. Unfortunately it seems to always trigger the die. For example, if I let student_id = triplett-1, the script dies. I’m a beginner, so I often have trouble seeing the “obvious.” Any suggestions will be appreciated! if ( $student_id =~ / (\A[a-z]+) # match and capture leading alphabetics - # hyphen to separate surname from number ([0-9]+\z) # match and capture trailing digits /xms# Perl Best Practices ) { $student_surname = $1; $student_number = $2; } else { die "Bad general form for student_id: $student_id" }; -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex problem?
The only problem I can see is that you want UPPERCASE-1234 and your regex has lowercase. Try (\A[A-Z]+) # match and capture leading alphabetics Andrew p.s Why not add "use strict; use warnings", "my $var;" and wear a seat belt when you're driving?:) On Wed, Nov 25, 2015 at 5:09 PM, Rick Twrote: > The following code apparently is not doing what I wanted. My intention was > to confirm that the general format of $student_id was this: several > uppercase letters followed by a hyphen followed by several digits. If not, > it would trigger the die. Unfortunately it seems to always trigger the die. > For example, if I let student_id = triplett-1, the script dies. I’m a > beginner, so I often have trouble seeing the “obvious.” Any suggestions > will be appreciated! > > if ( $student_id =~ > / > (\A[a-z]+) # match and > capture leading alphabetics > - # hyphen > to separate surname from number > ([0-9]+\z) # match and > capture trailing digits > /xms# Perl Best > Practices > ) { > $student_surname = $1; > $student_number = $2; > } > else { > die "Bad general form for student_id: $student_id" > }; > > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > > -- Andrew Solomon Mentor@Geekuni http://geekuni.com/ http://www.linkedin.com/in/asolomon
Fwd: regex problem?
-- Forwarded message -- From: Raj Barath <barat...@outlook.com<mailto:barat...@outlook.com>> Date: Wed, Nov 25, 2015 at 1:16 PM Subject: Re: regex problem? To: Rick T <p...@reason.net<mailto:p...@reason.net>> Hi Rick, You can use split. For example: my ( $stud_surname, $stud_number ) = split ( /-/, $student_id ); You are splitting on the hyphen character. -Raj On Wed, Nov 25, 2015 at 1:09 PM, Rick T <p...@reason.net<mailto:p...@reason.net>> wrote: The following code apparently is not doing what I wanted. My intention was to confirm that the general format of $student_id was this: several uppercase letters followed by a hyphen followed by several digits. If not, it would trigger the die. Unfortunately it seems to always trigger the die. For example, if I let student_id = triplett-1, the script dies. I’m a beginner, so I often have trouble seeing the “obvious.” Any suggestions will be appreciated! if ( $student_id =~ / (\A[a-z]+) # match and capture leading alphabetics - # hyphen to separate surname from number ([0-9]+\z) # match and capture trailing digits /xms# Perl Best Practices ) { $student_surname = $1; $student_number = $2; } else { die "Bad general form for student_id: $student_id" }; -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org<mailto:beginners-unsubscr...@perl.org> For additional commands, e-mail: beginners-h...@perl.org<mailto:beginners-h...@perl.org> http://learn.perl.org/
Re: regex problem?
On Wed, 25 Nov 2015 17:22:04 + Andrew Solomonwrote: > The only problem I can see is that you want UPPERCASE-1234 and your > regex has lowercase. Try > > (\A[A-Z]+) # match and capture leading alphabetics Please put the anchor outside the capture. And you could use the POSIX conventions: m{ \A ([[:upper:]]+) }msx; This will work with non-English characters. :) -- Don't stop where the ink does. Shawn -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
A regex problem?
I have a web form with a text area that I feed back through a cgi script and filter the text with; $q1_elaborate =~ s/[^[:alpha:]' .-]//g; quotemeta($q1_elaborate); I admit to doing a google search on perl remove malicious code and took that code from one of the results.(and not quite understanding what it does) However, it removes line feeds as well, so maybe that code is not all that good. Just wondering if this would be just as adequate in filtering malicious code $q1_elaborate =~ s/[`\\|!\.\^]//g TIA -- Owen -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: A regex problem?
On Mon, Aug 13, 2012 at 5:42 AM, Owen rc...@pcug.org.au wrote: I have a web form with a text area that I feed back through a cgi script and filter the text with; $q1_elaborate =~ s/[^[:alpha:]' .-]//g; quotemeta($q1_elaborate); However, it removes line feeds as well, so maybe that code is not all that good. Well the idea is to remove anything that might be bad but whitespace isn't bad so change that one blank in there for the \s metachar: $q1_elaborate =~ s/[^[:alpha:]'\s.-]//g; quotemeta($q1_elaborate); The trick here is it's using a character class for the match and the initial caret (^) negates the class so it means replace anything that is non-alph, single quote, whitespace, literal period or a dash with nothing. However (perldoc -f quotemeta quotemeta EXPR quotemeta Returns the value of EXPR with all non-word characters backslashed. (That is, all characters not matching /[A-Za-z_0-9]/ will be preceded by a backslash in the returned string, regardless of any locale settings.) This is the internal function implementing the \Q escape in double-quoted strings. The key there being returns - so I believe you'd want $q1_elaborate = quotemeta($q1_elaborate); Finally, while it probably doesn't matter here, IMNSHO, you should check your matching and react accordingly. If $q1_elaborate has one of the non-valid chars, do you care? if ( $q1_elaborate =~ s/[^[:alpha:]'\s.-]//g ) { # if appropriate warn(Non-valid chars in q1_elaborate\n); } $q1_elaborate = quotemeta($q1_elaborate); Again, not a big gain here, but as a rule of thumb - doing your match/subst in an if or if/else will give you a more robust program. -- a Andy Bach, afb...@gmail.com 608 658-1890 cell 608 261-5738 wk -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
regex problem
i have csv files in the following format, where some fields are enclosed in double quotes if they have commas embedded in them and all other fields are simply comma-delimited without any encapsulation, such as some,data,more,data,numbers,etc,data with a , in the datastream,yet more data,possibly more embedded ,'s,and,so,on,,, changing the formatting of the source file to enclose all fields in double quotes is not an option. i'm trying to figure out a regex, split, or some other functionality that will allow me to either 1. wrap each 'bare' field in double quotes (ignoring the embedded commas in the encapsulated fields)or 2. extract each field, automatically determining if commas should be ignored inside double quotes i know it should be relatively simple but i'm not yet fluent enough in regex to grasp the necessary double quote exceptions. any help is greatly appreciated. tia, joe -- since this is a gmail account, please verify the mailing list is included in the reply to addresses -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex problem
On 10-11-05 09:34 AM, jm wrote: i have csv files in the following format, where some fields are enclosed in double quotes if they have commas embedded in them and all other fields are simply comma-delimited without any encapsulation The best way to deal with CSV is to use a module from CPAN. Text::CVS http://search.cpan.org/~makamaka/Text-CSV-1.20/lib/Text/CSV.pm Text::CSV_XS http://search.cpan.org/~hmbrand/Text-CSV_XS-0.76/CSV_XS.pm -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding. The secret to great software: Fail early often. Eliminate software piracy: use only FLOSS. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex problem
On Fri, Nov 5, 2010 at 8:34 AM, jm jm5...@gmail.com wrote: changing the formatting of the source file to enclose all fields in double quotes is not an option. i'm trying to figure out a regex, split, or some other functionality that will allow me to either 1. wrap each 'bare' field in double quotes (ignoring the embedded commas in the encapsulated fields)or 2. extract each field, automatically determining if commas should be ignored inside double quotes Try the Text::CSV modulehttp://search.cpan.org/%7Emakamaka/Text-CSV-1.20/lib/Text/CSV.pm. It handles all of these details for you. -- Robert Wohlfarth
Re: regex problem
i appreciate the tips. unfortunately, adding modules to this server is not currently possible. does anyone have a more 'hands-on' solution? On Fri, Nov 5, 2010 at 8:53 AM, Shawn H Corey shawnhco...@gmail.com wrote: On 10-11-05 09:34 AM, jm wrote: i have csv files in the following format, where some fields are enclosed in double quotes if they have commas embedded in them and all other fields are simply comma-delimited without any encapsulation The best way to deal with CSV is to use a module from CPAN. Text::CVS http://search.cpan.org/~makamaka/Text-CSV-1.20/lib/Text/CSV.pm Text::CSV_XS http://search.cpan.org/~hmbrand/Text-CSV_XS-0.76/CSV_XS.pm -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding. The secret to great software: Fail early often. Eliminate software piracy: use only FLOSS. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ -- since this is a gmail account, please verify the mailing list is included in the reply to addresses -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
RE: regex problem
From: jm [mailto:jm5...@gmail.com] Sent: Friday, November 05, 2010 10:21 AM i appreciate the tips. unfortunately, adding modules to this server is not currently possible. does anyone have a more 'hands-on' solution? Take a look at the Text::ParseWords module. I believe it should be installed. perldoc Text::ParseWords. I have used it for similar problems in the past. Ken On Fri, Nov 5, 2010 at 8:53 AM, Shawn H Corey shawnhco...@gmail.com wrote: On 10-11-05 09:34 AM, jm wrote: i have csv files in the following format, where some fields are enclosed in double quotes if they have commas embedded in them and all other fields are simply comma-delimited without any encapsulation The best way to deal with CSV is to use a module from CPAN. Text::CVS http://search.cpan.org/~makamaka/Text-CSV-1.20/lib/Text/CSV.pm Text::CSV_XS http://search.cpan.org/~hmbrand/Text-CSV_XS-0.76/CSV_XS.pm Just my 0.0002 million dollars worth, Shawn -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Regex problem
To check the date passed with a script, I first check that the date is in the format 20dd (20 followed by 6 digits exactly) But the regex is wrong, tried /^20\d{6}/,/^20\d{6,6}?/,/^20\d{6,}?/ and while a 7 or lesser digit number fails, eg 2009101, a 9 digit number, like 200910103 does not fail. unless ( $ARGV[2] =~ /^20\d{6}?/ ) { print $ARGV[2]\tdate format is MMDD, eg 20091031\n; } How do I get the regex to fail a 9 digit number I suppose as a work around, I could say; unless ((length($ARGV[2]) == 8) and ( $ARGV[2] =~ /^20/){fail} TIA Owen -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex problem
- Original Message - From: Owen rc...@pcug.org.au Newsgroups: perl.beginners Hello Owen To check the date passed with a script, I first check that the date is in the format 20dd (20 followed by 6 digits exactly) But the regex is wrong, tried /^20\d{6}/,/^20\d{6,6}?/,/^20\d{6,}?/ and while a 7 or lesser digit number fails, eg 2009101, a 9 digit number, like 200910103 does not fail. unless ( $ARGV[2] =~ /^20\d{6}?/ ) { print unless ( $ARGV[2] =~ /^20\d{6}$/) ... If the end of line anchor is used, '$', the regex will accept an 8 digit number if it's the only entry in $ARGV[2] Chris $ARGV[2]\tdate format is MMDD, eg 20091031\n; } How do I get the regex to fail a 9 digit number I suppose as a work around, I could say; unless ((length($ARGV[2]) == 8) and ( $ARGV[2] =~ /^20/){fail} TIA Owen -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Regex problem
I have a lengthy list of data that I read in. I have substituted a one line example using __DATA__. The desired output would be 91416722243rd St I am getting this as output 91416722rd St - just the rd St The capturing reference on (\s)..$1 is not working # Intent # Look for 243 preceded by any white space, followed by a space char # Capture the whitespace as $1 # Replace with whatever the leading whitespace was, then the number, then the suffix rd and then the trailing space char Basically add the suffix rd to the number 243, ie...243rd I can do something else but I was wondering what I am doing wrong here Thanks jbl #!/usr/bin/perl -w use strict; open MY_OUTPUT_FILE, Export_Output_mod.txt or die Can't write to out.txt: $!; while ( defined ( my $line = DATA ) ) { $line =~ s/(\s)243 /$1243rd /g; print MY_OUTPUT_FILE $line; } close MY_OUTPUT_FILE; __END__ 91416722243 St -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex problem
jbl wrote: I have a lengthy list of data that I read in. I have substituted a one line example using __DATA__. The desired output would be 91416722 243rd St I am getting this as output 91416722rd St - just the rd St The capturing reference on (\s)..$1 is not working # Intent # Look for 243 preceded by any white space, followed by a space char # Capture the whitespace as $1 # Replace with whatever the leading whitespace was, then the number, then the suffix rd and then the trailing space char Basically add the suffix rd to the number 243, ie...243rd I can do something else but I was wondering what I am doing wrong here Thanks jbl #!/usr/bin/perl -w use strict; open MY_OUTPUT_FILE, Export_Output_mod.txt or die Can't write to out.txt: $!; while ( defined ( my $line = DATA ) ) { $line =~ s/(\s)243 /$1243rd /g; $line =~ s/(\s)243 /${1}243rd /g; print MY_OUTPUT_FILE $line; } close MY_OUTPUT_FILE; __END__ 91416722 243 St -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding. I like Perl; it's the only language where you can bless your thingy. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex problem
On Mon, Dec 21, 2009 at 9:11 AM, jbl jbl...@gmail.com wrote: The desired output would be 91416722243rd St I am getting this as output 91416722rd St - just the rd St snip while ( defined ( my $line = DATA ) ) { $line =~ s/(\s)243 /$1243rd /g; print MY_OUTPUT_FILE $line; } Try this: $line =~ s/(\s)243 /${1}243rd /g; Without the braces, Perl is looking for match number 1,243! Braces separate the 1 from the 243. -- Robert Wohlfarth
Re: Regex problem
Hi, On Mar 11, 1:16 am, nore...@gunnar.cc (Gunnar Hjalmarsson) wrote: I would do: if ( $a =~ /\.(?:html|jpg)$/i ) Please readhttp://perldoc.perl.org/perlretut.htmland other appropriate docs. Read the doc, but how to negate the Non-capturing groupings ? use strict; my $a = 'a.gif'; if ($a =~ /^(?:html|jpg)/gi) { print 'not html or jpg'; } Thanks, -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Regex problem, #.*# on new line
Hiya I got a string like so, and for the likes of me I can get regex to have it that each line is starts with #abc#. my $a = #aaa#message:details;extra:info;variable:times;#bbb#message:details;extra:info;variable:times;#ccc#not:always;the:same;ts:14:00.00;; $a =~ s/(?!#.#)/$1\n/i; Im so despertate i even tried something silly as join( \n, split(/#.*#/, #aaa#message:details;extra:info;variable:times;#bbb#message:details;extra:info;variable:times;#ccc#not:always;the:same;ts:14:00.00;)) if anyone can help, it would so appreciated. Kind Regards Brent Clark -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex problem
On 3/10/09 Tue Mar 10, 2009 8:41 PM, howa howac...@gmail.com scribbled: Hi, On Mar 11, 1:16 am, nore...@gunnar.cc (Gunnar Hjalmarsson) wrote: I would do: if ( $a =~ /\.(?:html|jpg)$/i ) Please readhttp://perldoc.perl.org/perlretut.htmland other appropriate docs. Read the doc, but how to negate the Non-capturing groupings ? use strict; my $a = 'a.gif'; if ($a =~ /^(?:html|jpg)/gi) { print 'not html or jpg'; } That will test if $a starts with 'html' or 'jpg'. To test for a non-match, use the !~ operator: If( $a !~ /(?:htm|jpg)$/gi ) { print not html or jpg\n; } -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex problem
Hello, On Mar 12, 12:34 am, jimsgib...@gmail.com (Jim Gibson) wrote: That will test if $a starts with 'html' or 'jpg'. To test for a non-match, use the !~ operator: I can't, since I will add more criteria into the regex, e.g. I need to match a.* , except a.html or a.jpg if ( $a =~ /a\.(?:html|jpg)$/i ) # of course this one does not work. Thanks. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex problem
On Wed, Mar 11, 2009 at 12:53, howa howac...@gmail.com wrote: Hello, On Mar 12, 12:34 am, jimsgib...@gmail.com (Jim Gibson) wrote: That will test if $a starts with 'html' or 'jpg'. To test for a non-match, use the !~ operator: I can't, since I will add more criteria into the regex, e.g. I need to match a.* , except a.html or a.jpg if ( $a =~ /a\.(?:html|jpg)$/i ) # of course this one does not work. snip You want a zero-width-negative-look-ahead: #!/usr/bin/perl use strict; use warnings; my @a = qw/a.html a.jpg a.gif/; for my $s (@a) { print $s , $s =~ /a[.](?!html|jpg)/ ? matches : does not match, \n; } -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex problem, #.*# on new line
Brent Clark wrote: Hiya Hello, I got a string like so, and for the likes of me I can get regex to have it that each line is starts with #abc#. my $a = #aaa#message:details;extra:info;variable:times;#bbb#message:details;extra:info;variable:times;#ccc#not:always;the:same;ts:14:00.00;; $a =~ s/(?!#.#)/$1\n/i; You are using a zero-width negative look-behind assertion which does not capture its contents. You are using a pattern that matches a total of three characters but you say you want to match five characters. You are using the /i option but there are no characters in the pattern that are affected by the /i option. You probably want something like: $x =~ s/(#...#[^#]*)/$1\n/g; John -- Those people who think they know everything are a great annoyance to those of us who do.-- Isaac Asimov -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Regex problem
Hello, Consider the code: #=== use strict; my $a = 'a.jpg'; if ($a =~ /(html|jpg)/gi) { print 'ok'; } #=== Is the brucket () must be needed? Since I am not using back reference, are there a better way? Thanks. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex problem
On 3/10/09 Tue Mar 10, 2009 8:19 AM, howa howac...@gmail.com scribbled: Hello, Consider the code: #=== use strict; my $a = 'a.jpg'; if ($a =~ /(html|jpg)/gi) { print 'ok'; } #=== Is the brucket () must be needed? Since I am not using back reference, are there a better way? No, the parentheses are not need in this simple case. The pattern /html|jpg/i will work fine (you don't need the 'g' modifier since you are only looking for one match). However, if you want other elements in your pattern, you may need parentheses to group sub-elements. For example, if you wanted to match only if the 'html' or 'jpg' were at the end of the string, then /html|jpg$/ will not work, as this pattern will match 'html' anywhere in the string. You will have to use /html$|jpg$/ or /(html|jpg)$/. Non-capturing parentheses can be used for clustering without capturing, as in /(?:html|jpg)$/. You can make your regexs a little more readable with the 'x' modifier: / html | jpg /ix -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex problem
On Tue, Mar 10, 2009 at 11:19, howa howac...@gmail.com wrote: Hello, Consider the code: #=== use strict; my $a = 'a.jpg'; if ($a =~ /(html|jpg)/gi) { print 'ok'; } #=== Is the brucket () must be needed? Since I am not using back reference, are there a better way? snip Since you have no other patterns in the regex you do not need the parentheses. If you had other patterns and wanted to avoid the slowdown associated with backreferences you could use the group-non-capturing parentheses: /foo[.](?:html|jpg)/ -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex problem
howa wrote: Hello, Consider the code: #=== use strict; my $a = 'a.jpg'; if ($a =~ /(html|jpg)/gi) { print 'ok'; } #=== Is the brucket () must be needed? Parentheses. What happened when you tried without them? And why the /g modifier? Since I am not using back reference, are there a better way? I would do: if ( $a =~ /\.(?:html|jpg)$/i ) Please read http://perldoc.perl.org/perlretut.html and other appropriate docs. -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
RE: Simple regex problem has me baffled
Hi Shawn, Here is the revised code fragment: open ( DATA, $INBOX/nlsrysows001_2090125.dat) || die Cannot open source file: $!; open ( FILE, $INBOX/request.dat) || die Cannot open request file: $!; chomp(@list=DATA); foreach $entry(@list) { $entry =~ /\[([a-z0-9]{5})\]/; $req_id=$1; print $req_id\n; } But I still get errors !! 8252c Use of uninitialized value in concatenation (.) or string at ./magic.pl line 19, DATA line 1044. 8252c Use of uninitialized value in concatenation (.) or string at ./magic.pl line 19, DATA line 1044. 8252d Use of uninitialized value in concatenation (.) or string at ./magic.pl line 19, DATA line 1044. 8252d Use of uninitialized value in concatenation (.) or string at ./magic.pl line 19, DATA line 1044. What is especially puzzling is that I have seen notation such as 'print $1\n;' in other scripts. Regards, Bill Harpley -Original Message- From: Mr. Shawn H. Corey [mailto:shawnhco...@magma.ca] Sent: Monday, January 26, 2009 4:32 PM To: Bill Harpley Cc: beginners@perl.org Subject: Re: Simple regex problem has me baffled On Mon, 2009-01-26 at 16:20 +0100, Bill Harpley wrote: foreach $entry(@list) { $entry =~ /\[([a-z0-9]{5})\]/; print $1\n; # print to screen # print FILE $1\n;# print to file } If there is no match, you are printing a uninitialized value; try: foreach my $entry ( @list ){ if( $entry =~ m{ \[ ( [a-z0-9]{5} ) \] }msx ){ my $request_id = $1; # ... } } -- Just my 0.0002 million dollars worth, Shawn It would appear that we have reached the limits of what it is possible to achieve with computer technology, although one should be careful with such statements, as they tend to sound pretty silly in 5 years. --John von Neumann, circa 1960 -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
RE: Simple regex problem has me baffled
Hi John, Thanks for your advice. (1) I actually have 'use warnings' enabled in the complete script. 'use strict' just gives me a load of unrelated compilation errors like Global symbol $INBOX requires explicit package name at ./magic.pl line 8. (2) I have also tried a WHILE loop but the result is the same (3) the hex digits in the Request_Id all have A-F in lower case (so there is only the range a-z) However, it does no harm to put this in, just in case it changes in the future. So I have made this change (4) I tried [[:xdigit:]] but to no avail So I remain stuck at square one !! Regards, Bill -Original Message- From: John W. Krahn [mailto:jwkr...@shaw.ca] Sent: Monday, January 26, 2009 5:20 PM To: Perl Beginners Subject: Re: Simple regex problem has me baffled Bill Harpley wrote: Hello, Hello, I have simple regex problem that is driving me crazy. I am writing a script to analyse a log file. It contains Java related information about requests and responses. Each pair of Request (REQ) and Response (RES) calls have a unique Request ID. This is a 5 digit hex number contained in square brackets (e.g. [81c2d] ). Using timestamps in each log entry, I need to calculate the time difference between the start of the Request and the end of the Response. As a first step, I thought I would identify the matching REQ/RES pairs in the log and then set about extracting the timestamp information and doing the calculations. I started with a simple script to extract the Request IDs from each log entry. Here is what one looks like (names have been changed to protect the innocent). [2009-01-23 09:20:48,719]TRACE [server-1] [http-80-5] a...@mydomain.net :090123-092048567:f5825 (SetCallForwardStatusImpl.java:call:54) - RequestId [81e80] SetCallForwardStatus.REQ { accountNumber:=W12345, phoneNumber:=12121212121, onBusyStatus:=true, busyCurrent:=voicemail, onNoAnswerStatus:=false, noAnswerCurent:=voicemail, onUncondStatus:=false, uncondCurrent:=voicemail } So I need to extract the 5 hex digits in RequestId [81e80]. Sounds simple, eh? Here is a fragment of my initial script: You should have the warnings and strict pragmas at the beginning of your program to let perl help you find mistakes: use warnings; use strict; open ( DATA, $INBOX/sample.log) || die Cannot open source file: $!; open ( FILE, $INBOX/request.dat) || die Cannot open request file: $!; chomp(@list=DATA); foreach $entry(@list) { It looks like you don't really have to read the entire file into memory in order to process it. You should perhaps use a while loop instead which will only read one line at a time: while ( my $entry = DATA ) { chomp $entry; And you may not need to chomp the current line if you are not accessing the data at the end of the line. $entry =~ /\[([a-z0-9]{5})\]/; You are looking for hexadecimal digits so you want either [a-fA-F0-9] or [[:xdigit:]] instead. print $1\n; # print to screen The contents of $1 are only valid if the regular expression matched successfully, otherwise $1 retains the contents from the previously successful match. if ( $entry =~ /RequestId\s+\[([a-fA-F0-9]{5})\]/ ) { print $1\n; } # print FILE $1\n; # print to file } John -- Those people who think they know everything are a great annoyance to those of us who do.-- Isaac Asimov -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
RE: Simple regex problem has me baffled
Hi Gunnar, I tried your suggestions but had no luck :-( (1) I tried your idea of using a paragraph separator local $/ = ''; # paragraph mode while ( my $entry = DATA ) { if ( $entry =~ /\[([a-z0-9]{5})]/ ) { print $1\n; } } But the only output which got was : # script.pl 8252c So it found the first line and then quit. So the separator is obviously the usual \n; At some point, I was planning to convert the long wrapped lines into a single long line, to make the later timestamp analysis easier. This is how the event records appear in the log: [2009-01-25 02:21:13,760]TRACE [server-1] [http-80-12] u...@mydomain.net:090125-022113763:4c213 (LimitVoIPLineImpl.java:call:54) ;- RequestId [8252c] LimitVoIPLine.REQ { accountNumber:=W1931627, phoneNumber:=1234512345 } ;[2009-01-25 02:21:22,104]TRACE [server-1] [http-80-12] u...@mydomain.net:090125-022113763:4c213 (LimitVoIPLineImpl.java:call:57) ;- RequestId [8252c] LimitVoIPLine.RES { LimitVoIPLine Result { Result:=Success } } ;[2009-01-25 02:21:34,675]TRACE [server-1] [http-80-20] u...@mydomain.net:090125-022134678:467d0 (LimitVoIPLineImpl.java:call:54) ;- RequestId [8252d] LimitVoIPLine.REQ { accountNumber:=W1931627, phoneNumber:=31455491773 } ;[2009-01-25 02:21:41,354]TRACE [server-1] [http-80-20] u...@mydomain.net:090125-022134678:467d0 (LimitVoIPLineImpl.java:call:57) ;- RequestId [8252d] LimitVoIPLine.RES { LimitVoIPLine Result { Result:=Success } } ;[2009-01-25 09:26:27,148]TRACE [server-1] [http-80-8] u...@mydomain.net:090125-092627068:48de4 ;(GetCallForwardStatusImpl.java:call:52) - RequestId [82534] GetCallForwardStatus.REQ { accountNumber:=W1576824, phoneNumber:=1234512345 ;} ;[2009-01-25 09:26:27,153]TRACE [server-1] [http-80-12] u...@mydomain.net:090125-092627077:5d89f ;(GetRestrictionListImpl.java:call:53) - RequestId [82535] GetRestrictionList.REQ { accountNumber:=W1576824, phoneNumber:=1234512345 } So a single event record can be split across several lines ( I assume this is not just a terminal wrap problem). Is this what you mean when you said that Probably because your code splits each entry into multiple @list elements. Would it be better to convert each record into a single long line before trying to perform regex match? Is there an easy way to do this? Regards, Bill Harpley -Original Message- From: Gunnar Hjalmarsson [mailto:nore...@gunnar.cc] Sent: Monday, January 26, 2009 5:22 PM To: beginners@perl.org Subject: Re: Simple regex problem has me baffled Bill Harpley wrote: [2009-01-23 09:20:48,719]TRACE [server-1] [http-80-5] a...@mydomain.net :090123-092048567:f5825 (SetCallForwardStatusImpl.java:call:54) - RequestId [81e80] SetCallForwardStatus.REQ { accountNumber:=W12345, phoneNumber:=12121212121, onBusyStatus:=true, busyCurrent:=voicemail, onNoAnswerStatus:=false, noAnswerCurent:=voicemail, onUncondStatus:=false, uncondCurrent:=voicemail } Is an entry divided into multiple lines? If so, and if the entries are separated by one or more empty lines, you probably want to enable paragraph mode. http://perldoc.perl.org/perlvar.html#$INPUT_RECORD_SEPARATOR chomp(@list=DATA); It seems to be unnecessary to read the whole log file into an array. chomp()ing seems to be unnecessary, too. $entry =~ /\[([a-z0-9]{5})\]/; You'd better check whether the regex matches. local $/ = ''; # paragraph mode while ( my $entry = DATA ) { if ( $entry =~ /\[([a-z0-9]{5})]/ ) { print $1\n; } } The first thing that puzzles me is that it obviously extracting the RequestId substring correctly, it seems to complain about the $1\n expression in line 16. This looks quite OK to me and I am baffled why I am getting this message. Probably because your code splits each entry into multiple @list elements. The other thing that puzzles me is that there can only be a single REQ/RES pair in the file with a given ID. So the RequestID should not appear more than twice in the The output list. Yet there are many instances where the RequestID appears more than twice. $1 retains its value from the latest successful match until the next time the regex matches successfully. -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
RE: Simple regex problem has me baffled
Rob, Thanks for your suggestion. It worked!! # script.pl 8252c 8252c 8252d 8252d 82534 82535 82535 82534 8253c 8253c 8253f 8253f 82542 82543 - big long list - So this is what did the trick: while (DATA) { next unless /RequestId \[([[:xdigit:]]+)\]/; print $1\n; } Can you explain why this works but my orginal effort did not? Many thanks, Bill Harpley -Original Message- From: Rob Dixon [mailto:rob.di...@gmx.com] Sent: Monday, January 26, 2009 7:19 PM To: Perl Beginners Cc: Bill Harpley Subject: Re: Simple regex problem has me baffled Bill Harpley wrote: Hello, I have simple regex problem that is driving me crazy. I am writing a script to analyse a log file. It contains Java related information about requests and responses. Each pair of Request (REQ) and Response (RES) calls have a unique Request ID. This is a 5 digit hex number contained in square brackets (e.g. [81c2d] ). Using timestamps in each log entry, I need to calculate the time difference between the start of the Request and the end of the Response. As a first step, I thought I would identify the matching REQ/RES pairs in the log and then set about extracting the timestamp information and doing the calculations. I started with a simple script to extract the Request IDs from each log entry. Here is what one looks like (names have been changed to protect the innocent). [2009-01-23 09:20:48,719]TRACE [server-1] [http-80-5] a...@mydomain.net :090123-092048567:f5825 (SetCallForwardStatusImpl.java:call:54) - RequestId [81e80] SetCallForwardStatus.REQ { accountNumber:=W12345, phoneNumber:=12121212121, onBusyStatus:=true, busyCurrent:=voicemail, onNoAnswerStatus:=false, noAnswerCurent:=voicemail, onUncondStatus:=false, uncondCurrent:=voicemail } So I need to extract the 5 hex digits in RequestId [81e80]. Sounds simple, eh? Here is a fragment of my initial script: open ( DATA, $INBOX/sample.log) || die Cannot open source file: $!; open ( FILE, $INBOX/request.dat) || die Cannot open request file: $!; chomp(@list=DATA); foreach $entry(@list) { $entry =~ /\[([a-z0-9]{5})\]/; print $1\n; # print to screen # print FILE $1\n; # print to file } I have spent quite a bit of time refining this expression and it looks OK to me. I basically just need to extract the 5-digit hex string and then write it to a file (or to screen). This is what I get when I run the script: Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 8252c Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 8252c Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 8252d Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 8252d Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 82534 82534 Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 82535 Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 82534 82534 82534 Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 8253c 8253c 8253c Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044 --- Big long list --note that RequestIDs from REQ/RES pairs need not be adjacent in the list -- The first thing that puzzles me is that it obviously extracting the RequestId substring correctly, it seems to complain about the $1\n expression in line 16. This looks quite OK to me and I am baffled why I am getting this message. The other thing that puzzles me is that there can only be a single REQ/RES pair in the file with a given ID. So the RequestID should not appear more than twice in the The output list. Yet there are many instances where the RequestID appears more than twice. Any help you guys can provide would be much appreciated. The Perl version is 5.8.4. on solaris 10 I think I would write while (DATA) { next unless /RequestId \[([[:xdigit:]]+)\]/; print $1\n; } HTH, Rob -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Simple regex problem has me baffled
Bill Harpley wrote: Hi Gunnar, I tried your suggestions but had no luck :-( (1) I tried your idea of using a paragraph separator local $/ = ''; # paragraph mode while ( my $entry = DATA ) { if ( $entry =~ /\[([a-z0-9]{5})]/ ) { print $1\n; } } But the only output which got was : # script.pl 8252c So it found the first line and then quit. So the separator is obviously the usual \n; So it seems. At some point, I was planning to convert the long wrapped lines into a single long line, to make the later timestamp analysis easier. This is how the event records appear in the log: [2009-01-25 02:21:13,760]TRACE [server-1] [http-80-12] u...@mydomain.net:090125-022113763:4c213 (LimitVoIPLineImpl.java:call:54) ;- RequestId [8252c] LimitVoIPLine.REQ { accountNumber:=W1931627, phoneNumber:=1234512345 } ;[2009-01-25 02:21:22,104]TRACE [server-1] [http-80-12] u...@mydomain.net:090125-022113763:4c213 (LimitVoIPLineImpl.java:call:57) ;- RequestId [8252c] LimitVoIPLine.RES { LimitVoIPLine Result { Result:=Success } } ;[2009-01-25 02:21:34,675]TRACE [server-1] [http-80-20] u...@mydomain.net:090125-022134678:467d0 (LimitVoIPLineImpl.java:call:54) ;- RequestId [8252d] LimitVoIPLine.REQ { accountNumber:=W1931627, phoneNumber:=31455491773 } ;[2009-01-25 02:21:41,354]TRACE [server-1] [http-80-20] u...@mydomain.net:090125-022134678:467d0 (LimitVoIPLineImpl.java:call:57) ;- RequestId [8252d] LimitVoIPLine.RES { LimitVoIPLine Result { Result:=Success } } ;[2009-01-25 09:26:27,148]TRACE [server-1] [http-80-8] u...@mydomain.net:090125-092627068:48de4 ;(GetCallForwardStatusImpl.java:call:52) - RequestId [82534] GetCallForwardStatus.REQ { accountNumber:=W1576824, phoneNumber:=1234512345 ;} ;[2009-01-25 09:26:27,153]TRACE [server-1] [http-80-12] u...@mydomain.net:090125-092627077:5d89f ;(GetRestrictionListImpl.java:call:53) - RequestId [82535] GetRestrictionList.REQ { accountNumber:=W1576824, phoneNumber:=1234512345 } So a single event record can be split across several lines ( I assume this is not just a terminal wrap problem). Is this what you mean when you said that Probably because your code splits each entry into multiple @list elements. Yes. Would it be better to convert each record into a single long line before trying to perform regex match? Well, it might make the next steps easier, but at first hand we ought to let Perl do the job, right? Even if paragraph mode is not applicable, since you are going to analyze the log file, somehow it makes sense to separate the log entries from each other. Now when I know a little more about the structure of the log, this is what I would try next: local $/ = }\n;; while ( my $entry = DATA ) { if ( $entry =~ /\[([a-z0-9]{5})]/ ) { print $1\n; } } -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Simple regex problem has me baffled
Hello, I have simple regex problem that is driving me crazy. I am writing a script to analyse a log file. It contains Java related information about requests and responses. Each pair of Request (REQ) and Response (RES) calls have a unique Request ID. This is a 5 digit hex number contained in square brackets (e.g. [81c2d] ). Using timestamps in each log entry, I need to calculate the time difference between the start of the Request and the end of the Response. As a first step, I thought I would identify the matching REQ/RES pairs in the log and then set about extracting the timestamp information and doing the calculations. I started with a simple script to extract the Request IDs from each log entry. Here is what one looks like (names have been changed to protect the innocent). [2009-01-23 09:20:48,719]TRACE [server-1] [http-80-5] a...@mydomain.net :090123-092048567:f5825 (SetCallForwardStatusImpl.java:call:54) - RequestId [81e80] SetCallForwardStatus.REQ { accountNumber:=W12345, phoneNumber:=12121212121, onBusyStatus:=true, busyCurrent:=voicemail, onNoAnswerStatus:=false, noAnswerCurent:=voicemail, onUncondStatus:=false, uncondCurrent:=voicemail } So I need to extract the 5 hex digits in RequestId [81e80]. Sounds simple, eh? Here is a fragment of my initial script: open ( DATA, $INBOX/sample.log) || die Cannot open source file: $!; open ( FILE, $INBOX/request.dat) || die Cannot open request file: $!; chomp(@list=DATA); foreach $entry(@list) { $entry =~ /\[([a-z0-9]{5})\]/; print $1\n; # print to screen # print FILE $1\n;# print to file } I have spent quite a bit of time refining this expression and it looks OK to me. I basically just need to extract the 5-digit hex string and then write it to a file (or to screen). This is what I get when I run the script: Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 8252c Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 8252c Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 8252d Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 8252d Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 82534 82534 Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 82535 Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 82534 82534 82534 Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 8253c 8253c 8253c Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044 --- Big long list --note that RequestIDs from REQ/RES pairs need not be adjacent in the list -- The first thing that puzzles me is that it obviously extracting the RequestId substring correctly, it seems to complain about the $1\n expression in line 16. This looks quite OK to me and I am baffled why I am getting this message. The other thing that puzzles me is that there can only be a single REQ/RES pair in the file with a given ID. So the RequestID should not appear more than twice in the The output list. Yet there are many instances where the RequestID appears more than twice. Any help you guys can provide would be much appreciated. The Perl version is 5.8.4. on solaris 10 Regards, Bill Harpley
Re: Simple regex problem has me baffled
On Mon, 2009-01-26 at 16:20 +0100, Bill Harpley wrote: foreach $entry(@list) { $entry =~ /\[([a-z0-9]{5})\]/; print $1\n; # print to screen # print FILE $1\n;# print to file } If there is no match, you are printing a uninitialized value; try: foreach my $entry ( @list ){ if( $entry =~ m{ \[ ( [a-z0-9]{5} ) \] }msx ){ my $request_id = $1; # ... } } -- Just my 0.0002 million dollars worth, Shawn It would appear that we have reached the limits of what it is possible to achieve with computer technology, although one should be careful with such statements, as they tend to sound pretty silly in 5 years. --John von Neumann, circa 1960 -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Simple regex problem has me baffled
Bill Harpley wrote: Hello, Hello, I have simple regex problem that is driving me crazy. I am writing a script to analyse a log file. It contains Java related information about requests and responses. Each pair of Request (REQ) and Response (RES) calls have a unique Request ID. This is a 5 digit hex number contained in square brackets (e.g. [81c2d] ). Using timestamps in each log entry, I need to calculate the time difference between the start of the Request and the end of the Response. As a first step, I thought I would identify the matching REQ/RES pairs in the log and then set about extracting the timestamp information and doing the calculations. I started with a simple script to extract the Request IDs from each log entry. Here is what one looks like (names have been changed to protect the innocent). [2009-01-23 09:20:48,719]TRACE [server-1] [http-80-5] a...@mydomain.net :090123-092048567:f5825 (SetCallForwardStatusImpl.java:call:54) - RequestId [81e80] SetCallForwardStatus.REQ { accountNumber:=W12345, phoneNumber:=12121212121, onBusyStatus:=true, busyCurrent:=voicemail, onNoAnswerStatus:=false, noAnswerCurent:=voicemail, onUncondStatus:=false, uncondCurrent:=voicemail } So I need to extract the 5 hex digits in RequestId [81e80]. Sounds simple, eh? Here is a fragment of my initial script: You should have the warnings and strict pragmas at the beginning of your program to let perl help you find mistakes: use warnings; use strict; open ( DATA, $INBOX/sample.log) || die Cannot open source file: $!; open ( FILE, $INBOX/request.dat) || die Cannot open request file: $!; chomp(@list=DATA); foreach $entry(@list) { It looks like you don't really have to read the entire file into memory in order to process it. You should perhaps use a while loop instead which will only read one line at a time: while ( my $entry = DATA ) { chomp $entry; And you may not need to chomp the current line if you are not accessing the data at the end of the line. $entry =~ /\[([a-z0-9]{5})\]/; You are looking for hexadecimal digits so you want either [a-fA-F0-9] or [[:xdigit:]] instead. print $1\n; # print to screen The contents of $1 are only valid if the regular expression matched successfully, otherwise $1 retains the contents from the previously successful match. if ( $entry =~ /RequestId\s+\[([a-fA-F0-9]{5})\]/ ) { print $1\n; } # print FILE $1\n; # print to file } John -- Those people who think they know everything are a great annoyance to those of us who do.-- Isaac Asimov -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Simple regex problem has me baffled
Bill Harpley wrote: [2009-01-23 09:20:48,719]TRACE [server-1] [http-80-5] a...@mydomain.net :090123-092048567:f5825 (SetCallForwardStatusImpl.java:call:54) - RequestId [81e80] SetCallForwardStatus.REQ { accountNumber:=W12345, phoneNumber:=12121212121, onBusyStatus:=true, busyCurrent:=voicemail, onNoAnswerStatus:=false, noAnswerCurent:=voicemail, onUncondStatus:=false, uncondCurrent:=voicemail } Is an entry divided into multiple lines? If so, and if the entries are separated by one or more empty lines, you probably want to enable paragraph mode. http://perldoc.perl.org/perlvar.html#$INPUT_RECORD_SEPARATOR chomp(@list=DATA); It seems to be unnecessary to read the whole log file into an array. chomp()ing seems to be unnecessary, too. $entry =~ /\[([a-z0-9]{5})\]/; You'd better check whether the regex matches. local $/ = ''; # paragraph mode while ( my $entry = DATA ) { if ( $entry =~ /\[([a-z0-9]{5})]/ ) { print $1\n; } } The first thing that puzzles me is that it obviously extracting the RequestId substring correctly, it seems to complain about the $1\n expression in line 16. This looks quite OK to me and I am baffled why I am getting this message. Probably because your code splits each entry into multiple @list elements. The other thing that puzzles me is that there can only be a single REQ/RES pair in the file with a given ID. So the RequestID should not appear more than twice in the The output list. Yet there are many instances where the RequestID appears more than twice. $1 retains its value from the latest successful match until the next time the regex matches successfully. -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Simple regex problem has me baffled
Bill Harpley wrote: Hello, I have simple regex problem that is driving me crazy. I am writing a script to analyse a log file. It contains Java related information about requests and responses. Each pair of Request (REQ) and Response (RES) calls have a unique Request ID. This is a 5 digit hex number contained in square brackets (e.g. [81c2d] ). Using timestamps in each log entry, I need to calculate the time difference between the start of the Request and the end of the Response. As a first step, I thought I would identify the matching REQ/RES pairs in the log and then set about extracting the timestamp information and doing the calculations. I started with a simple script to extract the Request IDs from each log entry. Here is what one looks like (names have been changed to protect the innocent). [2009-01-23 09:20:48,719]TRACE [server-1] [http-80-5] a...@mydomain.net :090123-092048567:f5825 (SetCallForwardStatusImpl.java:call:54) - RequestId [81e80] SetCallForwardStatus.REQ { accountNumber:=W12345, phoneNumber:=12121212121, onBusyStatus:=true, busyCurrent:=voicemail, onNoAnswerStatus:=false, noAnswerCurent:=voicemail, onUncondStatus:=false, uncondCurrent:=voicemail } So I need to extract the 5 hex digits in RequestId [81e80]. Sounds simple, eh? Here is a fragment of my initial script: open ( DATA, $INBOX/sample.log) || die Cannot open source file: $!; open ( FILE, $INBOX/request.dat) || die Cannot open request file: $!; chomp(@list=DATA); foreach $entry(@list) { $entry =~ /\[([a-z0-9]{5})\]/; print $1\n; # print to screen # print FILE $1\n; # print to file } I have spent quite a bit of time refining this expression and it looks OK to me. I basically just need to extract the 5-digit hex string and then write it to a file (or to screen). This is what I get when I run the script: Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 8252c Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 8252c Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 8252d Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 8252d Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 82534 82534 Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 82535 Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 82534 82534 82534 Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044. 8253c 8253c 8253c Use of uninitialized value in concatenation (.) or string at ./magic.pl line 16, DATA line 1044 --- Big long list --note that RequestIDs from REQ/RES pairs need not be adjacent in the list -- The first thing that puzzles me is that it obviously extracting the RequestId substring correctly, it seems to complain about the $1\n expression in line 16. This looks quite OK to me and I am baffled why I am getting this message. The other thing that puzzles me is that there can only be a single REQ/RES pair in the file with a given ID. So the RequestID should not appear more than twice in the The output list. Yet there are many instances where the RequestID appears more than twice. Any help you guys can provide would be much appreciated. The Perl version is 5.8.4. on solaris 10 I think I would write while (DATA) { next unless /RequestId \[([[:xdigit:]]+)\]/; print $1\n; } HTH, Rob -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Regex problem with accented characters
Hi, I am trying to extract the iso code and country name from a 3 column table (taken from en.wikipedia.org) and have noticed a problem with accented characters such as Ô. Below is my script and a sample of the data I am using. When I run the script the code beginning CI for Côte d'Ivoire returns the string CI\tC where as I had hoped for CI\tCôte d'Ivoire Does anyone know why \w+ does include Côte d'Ivoire and how I can get around it in future? TIA, Dp. extract.pl #!/usr/bin/perl use strict; use warnings; my $file = 'iso-alpha2.txt'; open(FH,$file) or die Can't open $file: $!\n; while (FH) { chomp; next if ($_ !~ /^\w{2}\s+/); my ($code,$name) = ($_ =~ /^(\w{2})\s+(\w+\s\w+\s\w+s\w+|\w+\s\w+\s\w+|\w+\s\w+|\w+)/); print $code\t$name\n; } === sample data ...snip BY Belarus Previously named Byelorussian S.S.R. BZ Belize CA Canada CC Cocos (Keeling) Islands CD Congo, the Democratic Republic of the Previously named Zaire ZR CF Central African Republic CG Congo CH Switzerland Code taken from Confoederatio Helvetica, its official Latin name CI Côte d'Ivoire CK Cook Islands CL Chile CM Cameroon === -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex problem with accented characters
On 03/27/2007 03:34 AM, Beginner wrote: Hi, I am trying to extract the iso code and country name from a 3 column table (taken from en.wikipedia.org) and have noticed a problem with accented characters such as Ô. Below is my script and a sample of the data I am using. When I run the script the code beginning CI for Côte d'Ivoire returns the string CI\tC where as I had hoped for CI\tCôte d'Ivoire Does anyone know why \w+ does include Côte d'Ivoire and how I can get around it in future? TIA, Dp. extract.pl #!/usr/bin/perl use strict; use warnings; my $file = 'iso-alpha2.txt'; open(FH,$file) or die Can't open $file: $!\n; while (FH) { chomp; next if ($_ !~ /^\w{2}\s+/); my ($code,$name) = ($_ =~ /^(\w{2})\s+(\w+\s\w+\s\w+s\w+|\w+\s\w+\s\w+|\w+\s\w+|\w+)/); print $code\t$name\n; } === sample data ...snip BY Belarus Previously named Byelorussian S.S.R. BZ Belize CA Canada CC Cocos (Keeling) Islands CD Congo, the Democratic Republic of the Previously named Zaire ZR CF Central African Republic CG Congo CH Switzerland Code taken from Confoederatio Helvetica, its official Latin name CI Côte d'Ivoire CK Cook Islands CL Chile CM Cameroon === It's partly the encoding. Put «use encoding iso-8859-1;» at the top of your program, and there will be a little improvement. However, that only gets you as far as Côte d; I doubt there is any encoding where apostrophe is in \w. It's probably best to create an expression that contains all of the characters you may want. That would include accented characters and the apostrophe in this case. Also, I advise you to use an programmer's editor that supports syntax highlighting. My VIM shows me that you missed the backslash that is supposed to be on the fourth \s in your regular expression. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex problem with accented characters
Beginner wrote: Hi, I am trying to extract the iso code and country name from a 3 column table (taken from en.wikipedia.org) and have noticed a problem with accented characters such as Ô. Below is my script and a sample of the data I am using. When I run the script the code beginning CI for Côte d'Ivoire returns the string CI\tC where as I had hoped for CI\tCôte d'Ivoire Does anyone know why \w+ does include Côte d'Ivoire and how I can get around it in future? TIA, Dp. extract.pl #!/usr/bin/perl use strict; use warnings; my $file = 'iso-alpha2.txt'; open(FH,$file) or die Can't open $file: $!\n; while (FH) { chomp; next if ($_ !~ /^\w{2}\s+/); my ($code,$name) = ($_ =~ /^(\w{2})\s+(\w+\s\w+\s\w+s\w+|\w+\s\w+\s\w+|\w+\s\w+|\w+)/); print $code\t$name\n; } === sample data ...snip BY Belarus Previously named Byelorussian S.S.R. BZ Belize CA Canada CC Cocos (Keeling) Islands CD Congo, the Democratic Republic of the Previously named Zaire ZR CF Central African Republic CG Congo CH Switzerland Code taken from Confoederatio Helvetica, its official Latin name CI Côte d'Ivoire CK Cook Islands CL Chile CM Cameroon === Ordinarily the range of characters mapped by \w is limited to [0-9A-Za-z_]. However, if you put 'use locale' at the start of your program this will be extended to include the accented alpha characters as well (see perldoc perllocale). However, this will still not solve your problem, as the apostrophe in Côte d'Ivoire will still not match \w and you will end up with CI\tCôte d. I suggest you change your regex to simply match any character at all up to the end of the line, like this: while (FH) { chomp; next unless /^(\w\w)\s+(.+?)\s*$/; my ($code, $name) = ($1, $2); print $code\t$name\n; } which will give the result you desire. But you still have the problem that the line for Zaire has no text and will not match the regex anyway! Hope this helps. Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex problem with accented characters
Beginner wrote: /^(\w{2})\s+(\w+\s\w+\s\w+s\w+|\w+\s\w+\s\w+|\w+\s\w+|\w+)/); It's worth noting that this could be written: /^(\w{2})\s+(\w+(?:\s\w+)*)/); Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: strange regex problem with backslash and newline
On 8/5/06, Peter Daum [EMAIL PROTECTED] wrote: $s='abc \ '; $s =~ /^(.*[^\\])(\\)?$/; print 1: '$1', 2: '$2'; Let's see what that pattern matches by annotating it: m{ ^ # start of string ( # memory 1 .*# any ol' junk, including backslashes [^\\] # any non-backslash, including newlines ) (\\)? # optional backslash (memory 2) $ # end of string (or final newline at eos) }x I would expect $1 to hold abc and $2==\\, but instead, the first grouping holds everything including the backslash and the following newline, while $2 is left undefined. the . obviously matched the newline at the end. No, the . matched the backslash; the [^\\] matched the newline. Does that get you back on the right track? Hope this helps! --Tom Phoenix Stonehenge Perl Training -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: strange regex problem with backslash and newline
Tom Phoenix wrote: On 8/5/06, Peter Daum [EMAIL PROTECTED] wrote: $s =~ /^(.*[^\\])(\\)?$/; print 1: '$1', 2: '$2'; Let's see what that pattern matches by annotating it: m{ ^ # start of string ( # memory 1 .*# any ol' junk, including backslashes [^\\] # any non-backslash, including newlines ... h ;-) I somehow had always assumed, that not only . but also other constructs (like the [^\\] which really was intended as [^\\\n]) treat the newline special and only \n or $ match the newline - certainly not something the Perl documentation says anywhere, but this was the first time I ever had a situation where this makes a difference. Thanks a lot for the explanation! Regards, Peter Daum -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: strange regex problem with backslash and newline
Peter Daum wrote: Hi, Hello, when trying to process continuation lines in a file, I ran into a weird phenomenon that I can't make any sense of: $s contains a line read from a file, that ends with a backslash (+ the newline character), so $s='abc \ '; $s =~ /^(.*)$/; print $1; # prints abc \ as expected If what you really want to do is put all the continuation lines on the same line then you can do it something like this: while ( my $s = FILE ) { if ( $s =~ s/\\\n/ / ) { $s .= FILE; redo; } # process complete line } John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
XML Parsing error - regex problem?
Hi all, I'm getting the following XML parsing error: [Fri Mar 10 09:37:39 2006] insert_xml.pl: not well-formed (invalid token) at line 13628, column 24, byte 413248: [Fri Mar 10 09:37:39 2006] insert_xml.pl: laLA14/la [Fri Mar 10 09:37:39 2006] insert_xml.pl: seed5741726/seed [Fri Mar 10 09:37:39 2006] insert_xml.pl: school_nameSt. Patricks R.C. P.S./school_name [Fri Mar 10 09:37:39 2006] insert_xml.pl: ===^ [Fri Mar 10 09:37:39 2006] insert_xml.pl: councilFalkirk/council [Fri Mar 10 09:37:39 2006] insert_xml.pl: ceCE-511 (Edge)/ce [Fri Mar 10 09:37:39 2006] insert_xml.pl: at /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi/XML/Parser.pm line 185 I've checked my XML file and it contains: school_nameSt. Patrick92s R.C. P.S./school_name This is because St. Patrick's contains an apostrophe. I have a couple of regexes to handle ampersands and apostrophes, however the apostrophe regex doesn't appear to be working correctly: ampersand regex works: $data-[$i] =~ s//#38;/g; apostrophe regex doesn't work: $data-[$i] =~ s/'/apos;/g; Any ideas on this one? G :) P.S. Thank you to all who replied to my previous post, I got that array dereferenced properly. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
RE: XML Parsing error - regex problem?
Hi all, I've worked out that the character is a type of apostrophe which has a hex value of 92. How would I write my regex to substitute this character for a normal apostrophe? I've tried: s/92/'/g; and it didn't work. Any ideas? From: Graeme McLaren [EMAIL PROTECTED] To: beginners@perl.org Subject: XML Parsing error - regex problem? Date: Fri, 10 Mar 2006 10:03:50 + MIME-Version: 1.0 X-Originating-IP: [212.250.155.249] X-Originating-Email: [EMAIL PROTECTED] X-Sender: [EMAIL PROTECTED] Received: from lists.develooper.com ([63.251.223.186]) by bay0-mc10-f2.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); Fri, 10 Mar 2006 03:40:24 -0800 Received: (qmail 30267 invoked by uid 514); 10 Mar 2006 10:08:22 - Received: (qmail 29736 invoked from network); 10 Mar 2006 10:05:11 - Received: from x1a.develooper.com (HELO x1.develooper.com) (216.52.237.111) by lists.develooper.com with SMTP; 10 Mar 2006 10:05:11 - Received: (qmail 634 invoked by uid 225); 10 Mar 2006 10:04:02 - Received: (qmail 626 invoked by alias); 10 Mar 2006 10:04:01 - Received: pass (x1.develooper.com: domain of [EMAIL PROTECTED] designates 64.4.56.20 as permitted sender) Received: from bay101-f10.bay101.hotmail.com (HELO hotmail.com) (64.4.56.20)by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Fri, 10 Mar 2006 02:03:56 -0800 Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Fri, 10 Mar 2006 02:03:51 -0800 Received: from 64.4.56.200 by by101fd.bay101.hotmail.msn.com with HTTP;Fri, 10 Mar 2006 10:03:50 GMT X-Message-Info: JGTYoYF78jEHjJx36Oi8+Z3TmmkSEdPt4iogl2abg+M= Mailing-List: contact [EMAIL PROTECTED]; run by ezmlm Precedence: bulk List-Post: mailto:beginners@perl.org List-Help: mailto:[EMAIL PROTECTED] List-Unsubscribe: mailto:[EMAIL PROTECTED] List-Subscribe: mailto:[EMAIL PROTECTED] List-Id: beginners.perl.org Delivered-To: mailing list beginners@perl.org Delivered-To: beginners@perl.org X-Spam-Status: No, hits=-0.7 required=8.0tests=BAYES_00,DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,MSGID_FROM_MTA_HEADER,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: la.mx.develooper.com X-OriginalArrivalTime: 10 Mar 2006 10:03:51.0759 (UTC) FILETIME=[EECFEDF0:01C64429] Return-Path: [EMAIL PROTECTED] Hi all, I'm getting the following XML parsing error: [Fri Mar 10 09:37:39 2006] insert_xml.pl: not well-formed (invalid token) at line 13628, column 24, byte 413248: [Fri Mar 10 09:37:39 2006] insert_xml.pl: laLA14/la [Fri Mar 10 09:37:39 2006] insert_xml.pl: seed5741726/seed [Fri Mar 10 09:37:39 2006] insert_xml.pl: school_nameSt. Patricks R.C. P.S./school_name [Fri Mar 10 09:37:39 2006] insert_xml.pl: ===^ [Fri Mar 10 09:37:39 2006] insert_xml.pl: councilFalkirk/council [Fri Mar 10 09:37:39 2006] insert_xml.pl: ceCE-511 (Edge)/ce [Fri Mar 10 09:37:39 2006] insert_xml.pl: at /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi/XML/Parser.pm line 185 I've checked my XML file and it contains: school_nameSt. Patrick92s R.C. P.S./school_name This is because St. Patrick's contains an apostrophe. I have a couple of regexes to handle ampersands and apostrophes, however the apostrophe regex doesn't appear to be working correctly: ampersand regex works: $data-[$i] =~ s///g; apostrophe regex doesn't work: $data-[$i] =~ s/'/apos;/g; Any ideas on this one? G :) P.S. Thank you to all who replied to my previous post, I got that array dereferenced properly. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: XML Parsing error - regex problem?
On 3/10/06, Graeme McLaren [EMAIL PROTECTED] wrote: I've checked my XML file and it contains: school_nameSt. Patrick92s R.C. P.S./school_name This is because St. Patrick's contains an apostrophe. I'm guessing that where I see four characters 92, the actual file has a single character. Some tools render unusual characters that way. I have a couple of regexes to handle ampersands and apostrophes, however the apostrophe regex doesn't appear to be working correctly: ampersand regex works: $data-[$i] =~ s///g; I'm not sure I know what you mean by works. It seems to be replacing every ampersand with an ampersand in the target string, which would be a no-op if it didn't have side effects. apostrophe regex doesn't work: $data-[$i] =~ s/'/apos;/g; It doesn't? It's probably matching any true apostrophes. I've worked out that the character is a type of apostrophe which has a hex value of 92. How would I write my regex to substitute this character for a normal apostrophe? I've tried: s/92/'/g; and it didn't work. I think you're looking for one of these: s/\x92/'/g s/\x92/apos;/g tr/\x92/'/ Backslash escapes are documented in perlop. Hope this helps! --Tom Phoenix Stonehenge Perl Training -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: regex problem
On 2/15/06, anand kumar [EMAIL PROTECTED] wrote: John W. Krahn [EMAIL PROTECTED] wrote:anand kumar wrote: Hi all, Hello, I have the following problem in the following regex replace. $line=~s!\b($name)\b!$1!g; here this regex finds the exact matching of the content in $name and does the needed but in some examples the variable $name may contain backslash characters like 'gene\l=s\' , in this type of cases the replace string does not work so i have removed '\b' on either side and used the following $line=~s!(\Q$name\E)!$1!g; This works fine but the problem is that the replacement is not done on the exact word but also on substrings which is unnecessary. if i use both \b\b and \Q\E then the code fails to replace. please send suggestions in this regard $line=~s!\b(\Q$name\E)\b!$1!g; Try this: if $name is a single-quoted string: $name = quotemeta($name); $line =~ s|($name)|au$1|; If $name is a double-quoted string: $name = quotemeta(quotemeta($name)); $line =~ s|($name)|au$1|; It's preferable, though for $name to be single-quoted, because Perl will do some interpolation at the time the string is saved, and depending on your system, strange things can happen. For instance, the following are not all equal: $name = quotemeta(\aball); # $name gets '\\x07ball' $name = '\aball';# $name gets '\aball' $name = \aball; $name = quotemeta($name); # $name gets '\\x07ball' This is because the double-quoted string is interpolated before it is assigned to a variable or passed to a function and the metacharacter--in this case '\a', the escape sequence for the ASCII bell character--is already interpolated. HTH, -- jay -- This email and attachment(s): [ ] blogable; [ x ] ask first; [ ] private and confidential daggerquill [at] gmail [dot] com http://www.tuaw.com http://www.dpguru.com http://www.engatiki.org values of β will give rise to dom!
Re: regex problem
John W. Krahn [EMAIL PROTECTED] wrote:anand kumar wrote: Hi all, Hello, I have the following problem in the following regex replace. $line=~s!\b($name)\b!$1!g; here this regex finds the exact matching of the content in $name and does the needed but in some examples the variable $name may contain backslash characters like 'gene\l=s\' , in this type of cases the replace string does not work so i have removed '\b' on either side and used the following $line=~s!(\Q$name\E)!$1!g; This works fine but the problem is that the replacement is not done on the exact word but also on substrings which is unnecessary. if i use both \b\b and \Q\E then the code fails to replace. please send suggestions in this regard $line=~s!\b(\Q$name\E)\b!$1!g; John -- use Perl; program fulfillment Hi john, i have tried the above method but the replace ment is done . regards anand - Jiyo cricket on Yahoo! India cricket Yahoo! Messenger Mobile Stay in touch with your buddies all the time.
regex problem
Hi all, I have the following problem in the following regex replace. $line=~s!\b($name)\b!au$1!g; here this regex finds the exact matching of the content in $name and does the needed but in some examples the variable $name may contain backslash characters like 'gene\l=s\' , in this type of cases the replace string does not work so i have removed '\b' on either side and used the following $line=~s!(\Q$name\E)!au$1!g; This works fine but the problem is that the replacement is not done on the exact word but also on substrings which is unnecessary. if i use both \b\b and \Q\E then the code fails to replace. please send suggestions in this regard Thanks in advance for the help Regards Anand - Jiyo cricket on Yahoo! India cricket Yahoo! Messenger Mobile Stay in touch with your buddies all the time.
Re: regex problem
anand kumar wrote: Hi all, Hello, I have the following problem in the following regex replace. $line=~s!\b($name)\b!au$1!g; here this regex finds the exact matching of the content in $name and does the needed but in some examples the variable $name may contain backslash characters like 'gene\l=s\' , in this type of cases the replace string does not work so i have removed '\b' on either side and used the following $line=~s!(\Q$name\E)!au$1!g; This works fine but the problem is that the replacement is not done on the exact word but also on substrings which is unnecessary. if i use both \b\b and \Q\E then the code fails to replace. please send suggestions in this regard $line=~s!\b(\Q$name\E)\b!au$1!g; John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Regex Problem.
I am at a loss here to generate REGEX for my problem. I have an input query coming to my cgi script, containg a word (with or without spaces e.g. blood Globin Test etc). What I am trying to do is to split this word (maximum of 3 characters) and find the BEST possible matching words within mySQL database. For example if the word is blood I want to get results using regex: for blood: check(blo) then check(loo) check(ood) for Globin Test: check(Glo) then check(lob) check(obi) check(bin) check(Tes) check(est) TIA. Sara. sub check { my $check = $dbh - prepare(SELECT * FROM medical WHERE def LIKE '%$query%' ); $check-execute(); while (my @row = $check - fetchrow_array()) { print blah blah blah\n; } }
Re: Regex Problem.
Sara wrote: I am at a loss here to generate REGEX for my problem. I have an input query coming to my cgi script, containg a word (with or without spaces e.g. blood Globin Test etc). What I am trying to do is to split this word (maximum of 3 characters) and find the BEST possible matching words within mySQL database. For example if the word is blood I want to get results using regex: for blood: check(blo) then check(loo) check(ood) for Globin Test: check(Glo) then check(lob) check(obi) check(bin) check(Tes) check(est) TIA. Sounds like you need a split then a substr rather than a regex, though I suppose it would work if you really wanted one, I wouldn't. perldoc -f split perldoc -f substr It will also be faster to combine everything into one select rather than for each possible token, but at the least if you are going to do multiple selects use 'prepare' with placeholders and only prepare the query once. So, -- UNTESTED -- my @tokens = split ' ', $entry; my @words; foreach my $token (@tokens) { push @words, substr $token, 0, 3; push @words, substr $token, -3, 3; } (or you can put the following into the above foreach however you would like) my $where = ''; my @bind; foreach my $word (@words) { $where .= ' OR ' if $where ne ''; $where .= (def LIKE ?); push @bind, %$word%; } my $sth = $dbh-prepare(SELECT * FROM medical WHERE $where); $sth-execute(@bind); while (my @row = $sth-fetchrow_array) { print join ' ', @row; print \n; } This also prevents SQL injection by quoting the query words properly. Sara. http://danconia.org sub check { my $check = $dbh - prepare(SELECT * FROM medical WHERE def LIKE '%$query%' ); $check-execute(); while (my @row = $check - fetchrow_array()) { print blah blah blah\n; } } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Regex Problem.
That's worked like a charm, You ALL are great. Thanks everyone for help. Sara. - Original Message - From: [EMAIL PROTECTED] To: 'Sara' [EMAIL PROTECTED] Sent: Thursday, August 18, 2005 10:50 PM Subject: RE: Regex Problem. Hi Sara, what is about somthing like $string = 'blood'; for($i=0; $i=length($string)-3;$i++) { check(substr($string,$i,3)); } Mit freundlichen Grüssen Ihr echtwahr.Webmaster http://www.echtwahr.de http://www.echtwahr.com -Original Message- From: Sara [mailto:[EMAIL PROTECTED] Sent: Thursday, August 18, 2005 5:48 PM To: beginners-cgi@perl.org Subject: Regex Problem. I am at a loss here to generate REGEX for my problem. I have an input query coming to my cgi script, containg a word (with or without spaces e.g. blood Globin Test etc). What I am trying to do is to split this word (maximum of 3 characters) and find the BEST possible matching words within mySQL database. For example if the word is blood I want to get results using regex: for blood: check(blo) then check(loo) check(ood) for Globin Test: check(Glo) then check(lob) check(obi) check(bin) check(Tes) check(est) TIA. Sara. sub check { my $check = $dbh - prepare(SELECT * FROM medical WHERE def LIKE '%$query%' ); $check-execute(); while (my @row = $check - fetchrow_array()) { print blah blah blah\n; } } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
regex problem
The following is not returning what I had expected... SUN1-BATCHperl -e '$a=q{/var/run}; $home=q{/var/123};print Yes - $a like $home\n if $a =~ /^$home/;' SUN1-BATCHperl -e '$a=q{/var/run}; $home=q{/var/ra};print Yes - $a like $home\n if $a =~ /^$home/;' SUN1-BATCHperl -e '$a=q{/var/run}; $home=q{/var/ru};print Yes - $a like $home\n if $a =~ /^$home/;' Yes - /var/run like /var/ru I would have assumed that /var/run would NOT be like /var/ru just as /var/run is not like /var/ra... John W Moon -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: regex problem
On Fri, 1 Jul 2005, Moon, John wrote: The following is not returning what I had expected... $a= q{/var/run}; $home = q{/var/ru}; print Yes - $a like $home\n if $a =~ /^$home/; I would have assumed that /var/run would NOT be like /var/ru just as /var/run is not like /var/ra... It depends what you mean by like. In this case, the string in $home also appears as part of $a, so in that sense there are alike, and the code is doing the right thing. If you want to verify that $a and $home are identical, it would be easier to just check if one `eq` the other, as print Yes - $a like $home\n if $a eq $home; That test will work if $home is '/var/run', but will fail on '/var/ru'. -- Chris Devers -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: regex problem
Moon, John [MJ], on Friday, July 1, 2005 at 11:30 (-0400 ) contributed this to our collective wisdom: MJ I would have assumed that /var/run would NOT be like /var/ru just as MJ /var/run is not like /var/ra... is /var/ru at the beginning of /var/run ? yes. -- ...m8s, cu l8r, Brano. [If they get too annoying then we'll just have to get violent...] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: regex problem
On 7/1/05, Moon, John [EMAIL PROTECTED] wrote: The following is not returning what I had expected... SUN1-BATCHperl -e '$a=q{/var/run}; $home=q{/var/123};print Yes - $a like $home\n if $a =~ /^$home/;' SUN1-BATCHperl -e '$a=q{/var/run}; $home=q{/var/ra};print Yes - $a like $home\n if $a =~ /^$home/;' SUN1-BATCHperl -e '$a=q{/var/run}; $home=q{/var/ru};print Yes - $a like $home\n if $a =~ /^$home/;' Yes - /var/run like /var/ru I would have assumed that /var/run would NOT be like /var/ru just as /var/run is not like /var/ra... John W Moon John A regex match checks to see if the specified pattern appears in the specified string. And the answer to the question is /var/ru in /var/run? is yes. Or to put it another way: $a =~ /$home/ is functionally (although not proceedurally) equivalent to: $a =~ /^.*$home.*$/ If you want to do a simple test for equality, use 'eq'. If you're going to test for a pattern and want to match on the entire string, anchor the patern at the beginning and end of the string: $a =~ /^$home$/ but if $home is a simple string without regex metacharaters 'eq' it going to be a lot faster than m//. HTH, -- jay daggerquill [at] gmail [dot] com http://www.tuaw.com http://www.dpguru.com http://www.engatiki.org -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
one-liner multi-line regex problem
I'm trying to write a perl one-liner that will edit an iCalendar format file to remove To Do items. The file contains several thousand lines, and I need to remove several multi-line blocks. The blocks to remove start with a line BEGIN:VTODO (without the quotes) and end with a line END:VTODO (also without quotes). I've tried the following one-liner, perl -p -i.bak -e 's/BEGIN:VTODO.*END:VTODO//sg' file_name_to_edit The .bak file is created, which tells me the one-liner is finding my file, but the file is identical to the old one - i.e. the regex doesn't seem to be matching anything. I'm also wondering whether my proposed one-liner (if it worked) would be too greedy. Would it pull out everything between the first BEGIN:VTODO and the last END:VTODO? I'd appreciate any hints. Thanks, Kevin Horton -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: one-liner multi-line regex problem
Hi Kevin just hints, no solution :-) Am Montag, 25. April 2005 12.59 schrieb Kevin Horton: I'm trying to write a perl one-liner that will edit an iCalendar format file to remove To Do items. The file contains several thousand lines, and I need to remove several multi-line blocks. The blocks to remove start with a line BEGIN:VTODO (without the quotes) and end with a line END:VTODO (also without quotes). I've tried the following one-liner, perl -p -i.bak -e 's/BEGIN:VTODO.*END:VTODO//sg' file_name_to_edit according to perldoc perlrun, -p reads _one_ line after the other, so you can't search for multiline patterns this way. The .bak file is created, which tells me the one-liner is finding my file, but the file is identical to the old one - i.e. the regex doesn't seem to be matching anything. I'm also wondering whether my proposed one-liner (if it worked) would be too greedy. yes or no, depends from the working implementation :-) Would it pull out everything between the first BEGIN:VTODO and the last END:VTODO? yes, if you try to match a string with the whole file in it with the regex above. I'd appreciate any hints. Thanks, Kevin Horton -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: one-liner multi-line regex problem
I'm trying to write a perl one-liner that will edit an iCalendar format file to remove To Do items. The file contains several thousand lines, and I need to remove several multi-line blocks. The blocks to remove start with a line BEGIN:VTODO (without the quotes) and end with a line END:VTODO (also without quotes). I've tried the following one-liner, perl -p -i.bak -e 's/BEGIN:VTODO.*END:VTODO//sg' file_name_to_edit Assuming you have enough disk space: perl -ane 'print unless /^BEGIN:VTODO/ .. /^END:VTODO/' old new perldoc perlrun for more info on perl's command line paramaters -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: one-liner multi-line regex problem
On 25-Apr-05, at 10:06 AM, Jay Savage wrote: On 4/25/05, Kevin Horton [EMAIL PROTECTED] wrote: I'm trying to write a perl one-liner that will edit an iCalendar format file to remove To Do items. The file contains several thousand lines, and I need to remove several multi-line blocks. The blocks to remove start with a line BEGIN:VTODO (without the quotes) and end with a line END:VTODO (also without quotes). I've tried the following one-liner, perl -p -i.bak -e 's/BEGIN:VTODO.*END:VTODO//sg' file_name_to_edit The .bak file is created, which tells me the one-liner is finding my file, but the file is identical to the old one - i.e. the regex doesn't seem to be matching anything. -p causes the file to be read one line at a time, which negates the usefulness of /s. If you have sufficient RAM to read the entire file into memory, you can use the -0 option to slurp the file: perl -0777 -p -i.bak -e 's/BEGIN:VTODO.*?END:VTODO//sg' This seems to work perfectly. I've studied the output for five minutes, and can't find a problem. Thank you very much. see perldoc perlrun for details I've learned a lot in the last few minutes, now that I know which of the perldoc files to look in. I'm also wondering whether my proposed one-liner (if it worked) would be too greedy. Would it pull out everything between the first BEGIN:VTODO and the last END:VTODO? Yes it will. I looked at trying to use the ? to stop the potential greedyness, but I didn't grok how it worked. Now that I have an example, I think I understand it (again, as I thought I understood when I was first puzzling through perl on vacation in Christmas 2003). Hopefully my understanding this time is more lasting. :) Thanks so much to the several people who responded. Kevin Horton Ottawa, Canada RV-8 - Finishing Kit http://www.kilohotel.com/rv8 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: one-liner multi-line regex problem
On 25-Apr-05, at 10:06 AM, Jay Savage wrote: On 4/25/05, Kevin Horton [EMAIL PROTECTED] wrote: I'm trying to write a perl one-liner that will edit an iCalendar format file to remove To Do items. The file contains several thousand lines, and I need to remove several multi-line blocks. The blocks to remove start with a line BEGIN:VTODO (without the quotes) and end with a line END:VTODO (also without quotes). I've tried the following one-liner, perl -p -i.bak -e 's/BEGIN:VTODO.*END:VTODO//sg' file_name_to_edit The .bak file is created, which tells me the one-liner is finding my file, but the file is identical to the old one - i.e. the regex doesn't seem to be matching anything. -p causes the file to be read one line at a time, which negates the usefulness of /s. If you have sufficient RAM to read the entire file into memory, you can use the -0 option to slurp the file: perl -0777 -p -i.bak -e 's/BEGIN:VTODO.*?END:VTODO//sg' This seems to work perfectly. I've studied the output for five minutes, and can't find a problem. Thank you very much. see perldoc perlrun for details I've learned a lot in the last few minutes, now that I know which of the perldoc files to look in. I'm also wondering whether my proposed one-liner (if it worked) would be too greedy. Would it pull out everything between the first BEGIN:VTODO and the last END:VTODO? Yes it will. I looked at trying to use the ? to stop the potential greedyness, but I didn't grok how it worked. Now that I have an example, I think I understand it (again, as I thought I understood when I was first puzzling through perl on vacation in Christmas 2003). Hopefully my understanding this time is more lasting. :) Thanks so much to the several people who responded. Kevin Horton Ottawa, Canada RV-8 - Finishing Kit http://www.kilohotel.com/rv8 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: YA Regex problem: lookahead assertion
Offer Kaye wrote on 23.03.2005: Change your RE to: m#h1(.+?)/h1(.+?)(?=h1|$)#gs In other words, look ahead to either a h1 or the end of the string ($). I have to admit this problem wasn't as simple as I initially thought - I still have no idea why my first guess didn't work: m#h1(.+?)/h1(.+?)(?=h1)?#gs Maybe someone with more knowledge of REs can answer? John W. Krahn wrote on 23.03.2005: This should work (untested) while ($content =~ m#h1(.+?)/h1(.+?)(?=h1|\z)#gs) { Hi, and thanks. I tried Offer Kaye's first guess, too, and I think I can explain why it does not work. If you make the lookahead optional, the regex will try to match as few characters as possible for the second parentheses - and since the lookahead is optional, this will be only a single character. You have to force a positive lookahead assertion to make sure $2 receives everything up to either the next h1 or the end of the string. So the other suggestion works. Thank you! The reason I had not tried that was the wrong assumption that alternations in lookahead/lookbehind assertions had to be of the same length, like in (?=abc|def), but not (?=abc|defg). But now I remember that the whole lookahead/lookbehind has to be of a fixed length, so you cannot use quantifiers. Thanks again, Jan -- A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: YA Regex problem: lookahead assertion
Jan Eden wrote: John W. Krahn wrote on 23.03.2005: This should work (untested) while ($content =~ m#h1(.+?)/h1(.+?)(?=h1|\z)#gs) { and thanks. I tried Offer Kaye's first guess, too, and I think I can explain why it does not work. If you make the lookahead optional, the regex will try to match as few characters as possible for the second parentheses - and since the lookahead is optional, this will be only a single character. You have to force a positive lookahead assertion to make sure $2 receives everything up to either the next h1 or the end of the string. So the other suggestion works. Thank you! The reason I had not tried that was the wrong assumption that alternations in lookahead/lookbehind assertions had to be of the same length, like in (?=abc|def), but not (?=abc|defg). But now I remember that the whole lookahead/lookbehind has to be of a fixed length, so you cannot use quantifiers. lookahead CAN use quantifiers but lookbehind CANNOT. John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
YA Regex problem: lookahead assertion
Hi, I use the following regex to split a (really simple) file into sections headed by h1.+?/h1: while ($content =~ m#h1(.+?)/h1(.+?)(?=h1)#gs) { ... } This works perfectly, but obviously does not catch the last section, as it is not followed by h1. How can I catch the last section without * doing a separate match for it * loosing the convenience of the g switch to wade through the whole file? Thanks, Jan -- I'd never join any club that would have the likes of me as a member. - Groucho Marx -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: YA Regex problem: lookahead assertion
On Wed, 23 Mar 2005 17:06:59 +0100, Jan Eden wrote: Hi, I use the following regex to split a (really simple) file into sections headed by h1.+?/h1: while ($content =~ m#h1(.+?)/h1(.+?)(?=h1)#gs) { ... } This works perfectly, but obviously does not catch the last section, as it is not followed by h1. How can I catch the last section without * doing a separate match for it * loosing the convenience of the g switch to wade through the whole file? Thanks, Jan Change your RE to: m#h1(.+?)/h1(.+?)(?=h1|$)#gs In other words, look ahead to either a h1 or the end of the string ($). I have to admit this problem wasn't as simple as I initially thought - I still have no idea why my first guess didn't work: m#h1(.+?)/h1(.+?)(?=h1)?#gs Maybe someone with more knowledge of REs can answer? Regards, -- Offer Kaye -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
RE: YA Regex problem: lookahead assertion
Jan Eden mailto:[EMAIL PROTECTED] wrote: : Hi, : : I use the following regex to split a (really simple) file into : sections headed by h1.+?/h1: : : while ($content =~ m#h1(.+?)/h1(.+?)(?=h1)#gs) { : ... : } The answer may be in your description. Use 'split'. When you use a capture inside the regular expression in 'split', the capture is returned. @content is 'shift'ed to rid the first empty element (or filled if there is something before the first h1) returned by split. #!/usr/bin/perl use strict; use warnings; use Data::Dumper 'Dumper'; my $content = do{ local $/ = undef; DATA; }; my @content = split m|h1(.+?)/h1|, $content; shift @content; print Dumper [EMAIL PROTECTED]; __END__ h1heading 1/h1 Some stuff h1heading 2/h1 Some stuff h1heading 3/h1 Some stuff h1heading 4/h1 Some stuff HTH, Charles K. Clarkson -- Mobile Homes Specialist 254 968-8328 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: YA Regex problem: lookahead assertion
Jan Eden wrote: Hi, Hello, I use the following regex to split a (really simple) file into sections headed by h1.+?/h1: while ($content =~ m#h1(.+?)/h1(.+?)(?=h1)#gs) { ... } This works perfectly, but obviously does not catch the last section, as it is not followed by h1. How can I catch the last section without * doing a separate match for it * loosing the convenience of the g switch to wade through the whole file? This should work (untested) while ($content =~ m#h1(.+?)/h1(.+?)(?=h1|\z)#gs) { John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: A regex problem.
[EMAIL PROTECTED] (Denham Eva) wrote in news:[EMAIL PROTECTED]: Hello Gurus, In a script I have a piece of code as such:- * snip** my $filedate =~ s/(\d+)//g; * snip end *** The data I am parsing looks as such :- ** DATA C:/directory/MSISExport_20040814.csv C:/directory/MSISExport_20040813.csv . C:/directory/MSISExport_20030501.csv ** DATA end * Now I am actually trying to dump everything except the date or numerals as such :- 20040814 Can someone help me with that regex? I am having a frustrating time of it! Much appreciated Denham Denham, If you have the filename in a scalar called $filename then the code to place the date into a scalar called $filedate would be: ($filedate) = $filename =~ m|MSISExport_([0-9]+)\.csv|; This places the captured value in ([0-9]+) into $filedate. This doesn't catch an instance where the filename is malformed and that date doesn't exist there. Hope this helps, Bill -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
A regex problem.
Hello Gurus, In a script I have a piece of code as such:- * snip** my $filedate =~ s/(\d+)//g; * snip end *** The data I am parsing looks as such :- ** DATA C:/directory/MSISExport_20040814.csv C:/directory/MSISExport_20040813.csv . . . . C:/directory/MSISExport_20030501.csv ** DATA end * Now I am actually trying to dump everything except the date or numerals as such :- 20040814 Can someone help me with that regex? I am having a frustrating time of it! Much appreciated Denham
RE: A regex problem.
Hi, Try in this way. Just remove my, you will get it. $filedate = C:/directory/MSISExport_20040814.csv; ($filedate) =~ s/(\_\d+)//g; print $filedate\n; Thank you jaffer -Original Message- From: Denham Eva [mailto:[EMAIL PROTECTED] Sent: Monday, September 06, 2004 6:11 PM To: [EMAIL PROTECTED] Subject: A regex problem. Hello Gurus, In a script I have a piece of code as such:- * snip** my $filedate =~ s/(\d+)//g; * snip end *** The data I am parsing looks as such :- ** DATA C:/directory/MSISExport_20040814.csv C:/directory/MSISExport_20040813.csv -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: A regex problem.
Denham Eva wrote: Hello Gurus, In a script I have a piece of code as such:- * snip** my $filedate =~ s/(\d+)//g; Try this instead: my $filedate; if( $var_with_file_name =~ m/(\d+)\.csv$/ ) { $filedate = $1; } print $filename\n; * snip end *** The data I am parsing looks as such :- ** DATA C:/directory/MSISExport_20040814.csv C:/directory/MSISExport_20040813.csv . . . . C:/directory/MSISExport_20030501.csv ** DATA end * Now I am actually trying to dump everything except the date or numerals as such :- 20040814 Can someone help me with that regex? I am having a frustrating time of it! Much appreciated Denham -- Flemming Greve Skovengaard Man still has one belief, a.k.a Greven, TuxPower One decree that stands alone [EMAIL PROTECTED]The laying down of arms 4112.38 BogoMIPS Is like cancer to their bones -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: A regex problem.
Jaffer Shaik wrote: Try in this way. Just remove my, you will get it. What kind of stupid advice is that? $filedate = C:/directory/MSISExport_20040814.csv; ($filedate) =~ s/(\_\d+)//g; Left aside that the parentheses are redundant, that does the opposite of what the OP asked for. -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: A regex problem.
Denham Eva [DE], on Monday, September 6, 2004 at 14:41 (+0200) typed: DE my $filedate =~ s/(\d+)//g; DE ** DATA DE C:/directory/MSISExport_20040814.csv DE C:/directory/MSISExport_20040813.csv DE Can someone help me with that regex? I am having a frustrating time of I hope this help you: use strict; for (DATA) { print $1\n if /MSISExport_(\d+)\.csv$/gi; } __DATA__ C:/directory/MSISExport_20040814.csv C:/directory/MSISExport_20040816.csv C:/directory/MSISExport_20040817.csv C:/directory/MSISExport_20040824.csv -- ...m8s, cu l8r, Brano. [Paragraph. Paragraph. Paragraph. Paragraph. Paragraph. -David Moser] -=x=- Skontrolované antivírovým programom NOD32 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
RE: regex problem
cool thanks I guess I am a wanna be programmer but do UNIX in real life. So Data::Dumper shows me a structure of any scaler? Could you show me an example? thank you, Derek B. Smith OhioHealth IT UNIX / TSM / EDM Teams Charles K. Clarkson [EMAIL PROTECTED] 08/09/2004 06:31 PM To: [EMAIL PROTECTED] cc: [EMAIL PROTECTED] Subject:RE: regex problem [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: : it is a system app call that populates the : $EDM_nonactive_tapelist I am not sure what you mean : I'm not sure. has the Orig strings in it is not a : precise statement for a computer programmer. I meant that has the Orig strings in it does not tell us how the strings are represented. It does not precisely define how the data is structured. That statement does not accurately describe the data. Here are two examples of strings listed in a scalar. In both cases I could describe each of these examples as a scalar variable with strings in it. $baz = [ [ 'foo' ], [ 'bar' ], ]; $baz = foo\nbar\n; As computer programmers, we have to describe data precisely. If you are uncertain how to describe a structure try printing it with DATA::Dumper. : the foreach with the split did work! Great. I'm glad I could help. HTH, Charles K. Clarkson -- Mobile Homes Specialist 254 968-8328
RE: regex problem
On Tue, 10 Aug 2004 [EMAIL PROTECTED] wrote: So Data::Dumper shows me a structure of any scaler? Could you show me an example? Data::Dumper is a tool for showing the structure of *any* data. As is often the case, the perldoc has some of the best documentation: perldoc Data::Dumper It starts out with this: NAME Data::Dumper - stringified perl data structures, suitable for both printing and eval SYNOPSIS use Data::Dumper; # simple procedural interface print Dumper($foo, $bar); # extended usage with names print Data::Dumper-Dump([$foo, $bar], [qw(foo *ary)]); # configuration variables { local $Data::Dumper::Purity = 1; eval Data::Dumper-Dump([$foo, $bar], [qw(foo *ary)]); } # OO usage $d = Data::Dumper-new([$foo, $bar], [qw(foo *ary)]); ... print $d-Dump; ... $d-Purity(1)-Terse(1)-Deepcopy(1); eval $d-Dump; And goes on to describe usage details more examples. Good luck with it! -- Chris Devers -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
regex problem
All I am getting the error from my if statement: ^* matches null string many times in regex; marked by -- HERE in m/^* -- HERE Orig/ at . I am trying to get everything except *Orig in this output : *Orig Vol: 1703FBBDED58D4AD (E00117), Seq #: 000114 in TLU: st_9840_acs_0, media: STK 984e Orig Vol: 0303E68522777483 (E00486), Seq #: 000800 in TLU: st_9840_acs_0, media: STK 984e 07/12/2004 18:13:17 Rotation ID:4A03CC27.A30DEE72.0200.0E0B8707, 5 backups Media duplication is not enabled. *Orig Vol: 4A03CC27A30DEE72 (E00632), Seq #: 000273 in TLU: st_9840_acs_0, media: STK 984e Here is my code: foreach ($EDM_nonactive_tapelist) { if ($EDM_nonactive_tapelist !~ \^\*Orig) { print $_; } } *NOTE the variable $EDM_nonactive_tapelist has the Orig strings in it. Does foreach read line by line? Do I even need the foreach statement? thank you! Derek B. Smith OhioHealth IT UNIX / TSM / EDM Teams
Re: regex problem
[EMAIL PROTECTED] wrote: All I am getting the error from my if statement: ^* matches null string many times in regex; marked by -- HERE in m/^* -- HERE Orig/ at . I am trying to get everything except *Orig in this output : samlpe data snipped Here is my code: foreach ($EDM_nonactive_tapelist) { if ($EDM_nonactive_tapelist !~ \^\*Orig) { print $_; } } - The ^ character shall not be escaped when marking the beginning of a string. - You need to tell Perl that you want to use the m// operator, either like m^\*Orig or by using straight slashes: /^\*Orig/ But why use a regex at all? print unless substr($_, 0, 5) eq '*Orig'; *NOTE the variable $EDM_nonactive_tapelist has the Orig strings in it. Does foreach read line by line? Not unless you tell Perl so: foreach ( split /\n/, $EDM_nonactive_tapelist ) { print $_\n unless substr($_, 0, 5) eq '*Orig'; } Do I even need the foreach statement? No. print map $_\n, grep { substr($_, 0, 5) ne '*Orig' } $EDM_nonactive_tapelist =~ /(.+)/mg; ;-) -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: regex problem
I am still getting the same error with your suggestion. Does foreach read line by line? Do I need the foreach? Derek B. Smith OhioHealth IT UNIX / TSM / EDM Teams 614-566-4145 Felix Li [EMAIL PROTECTED] 08/09/2004 03:56 PM To: [EMAIL PROTECTED] cc: Subject:Re: regex problem perhaps you meant ^\* ... rather than \^\* ... the later will trap things beginning with ^* ... - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, August 09, 2004 3:54 PM Subject: regex problem All I am getting the error from my if statement: ^* matches null string many times in regex; marked by -- HERE in m/^* -- HERE Orig/ at . I am trying to get everything except *Orig in this output : *Orig Vol: 1703FBBDED58D4AD (E00117), Seq #: 000114 in TLU: st_9840_acs_0, media: STK 984e Orig Vol: 0303E68522777483 (E00486), Seq #: 000800 in TLU: st_9840_acs_0, media: STK 984e 07/12/2004 18:13:17 Rotation ID:4A03CC27.A30DEE72.0200.0E0B8707, 5 backups Media duplication is not enabled. *Orig Vol: 4A03CC27A30DEE72 (E00632), Seq #: 000273 in TLU: st_9840_acs_0, media: STK 984e Here is my code: foreach ($EDM_nonactive_tapelist) { if ($EDM_nonactive_tapelist !~ \^\*Orig) { print $_; } } *NOTE the variable $EDM_nonactive_tapelist has the Orig strings in it. Does foreach read line by line? Do I even need the foreach statement? thank you! Derek B. Smith OhioHealth IT UNIX / TSM / EDM Teams
RE: regex problem
[EMAIL PROTECTED] [EMAIL PROTECTED] wrote: : All I am getting the error from my if statement: : : ^* matches null string many times in regex; marked by -- : HERE in m/^* -- : HERE Orig/ at . : : I am trying to get everything except *Orig in this output : : : *Orig Vol: 1703FBBDED58D4AD (E00117), Seq #: 000114 in TLU: : st_9840_acs_0, media: STK 984e : Orig Vol: 0303E68522777483 (E00486), Seq #: 000800 in TLU: : st_9840_acs_0, media: STK 984e : : 07/12/2004 18:13:17 Rotation : ID:4A03CC27.A30DEE72.0200.0E0B8707, 5 : backups : Media duplication is not enabled. : : *Orig Vol: 4A03CC27A30DEE72 (E00632), Seq #: 000273 in TLU: : st_9840_acs_0, media: STK 984e : : Here is my code: : : foreach ($EDM_nonactive_tapelist) { : if ($EDM_nonactive_tapelist !~ \^\*Orig) { : print $_; : } : } : : *NOTE the variable $EDM_nonactive_tapelist has the Orig strings : in it. Does foreach read line by line? No. 'foreach' as used above aliases $_ to each element of a list of scalars one item at a time. The function does not know the concept of line. You have provided a list of one scalar - $EDM_nonactive_tapelist. The loop will process $EDM_nonactive_tapelist once and place it's value in $_. Any changes to $_ will also change $EDM_nonactive_tapelist. Assuming $EDM_nonactive_tapelist has a list of strings separated by newlines (\n), a list of those strings might be expressed as this. foreach my $srting ( split /\n/, $EDM_nonactive_tapelist ) { print $srting\n if /^\*Orig/; } In this example we have taken each string and placed it in a scalar variable named $string. $string is tested and printed if that test is true. The 'split' splits each string at the newline and discard that character. : Do I even need the foreach statement? I'm not sure. has the Orig strings in it is not a precise statement for a computer programmer. Question: How did this list of strings get into a single scalar? HTH, Charles K. Clarkson -- Mobile Homes Specialist 254 968-8328 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
RE: regex problem
it is a system app call that populates the $EDM_nonactive_tapelist I am not sure what you meanI'm not sure. has the Orig strings in it is not a precise statement for a computer programmer. the variable $EDM_nonactive_tapelist which is a file with the Orig strings in it ! the foreach with the split did work! thanks! Derek B. Smith OhioHealth IT UNIX / TSM / EDM Teams Charles K. Clarkson [EMAIL PROTECTED] 08/09/2004 05:41 PM To: [EMAIL PROTECTED], [EMAIL PROTECTED] cc: Subject:RE: regex problem [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: : All I am getting the error from my if statement: : : ^* matches null string many times in regex; marked by -- : HERE in m/^* -- : HERE Orig/ at . : : I am trying to get everything except *Orig in this output : : : *Orig Vol: 1703FBBDED58D4AD (E00117), Seq #: 000114 in TLU: : st_9840_acs_0, media: STK 984e : Orig Vol: 0303E68522777483 (E00486), Seq #: 000800 in TLU: : st_9840_acs_0, media: STK 984e : : 07/12/2004 18:13:17 Rotation : ID:4A03CC27.A30DEE72.0200.0E0B8707, 5 : backups : Media duplication is not enabled. : : *Orig Vol: 4A03CC27A30DEE72 (E00632), Seq #: 000273 in TLU: : st_9840_acs_0, media: STK 984e : : Here is my code: : : foreach ($EDM_nonactive_tapelist) { : if ($EDM_nonactive_tapelist !~ \^\*Orig) { : print $_; : } : } : : *NOTE the variable $EDM_nonactive_tapelist has the Orig strings : in it. Does foreach read line by line? No. 'foreach' as used above aliases $_ to each element of a list of scalars one item at a time. The function does not know the concept of line. You have provided a list of one scalar - $EDM_nonactive_tapelist. The loop will process $EDM_nonactive_tapelist once and place it's value in $_. Any changes to $_ will also change $EDM_nonactive_tapelist. Assuming $EDM_nonactive_tapelist has a list of strings separated by newlines (\n), a list of those strings might be expressed as this. foreach my $srting ( split /\n/, $EDM_nonactive_tapelist ) { print $srting\n if /^\*Orig/; } In this example we have taken each string and placed it in a scalar variable named $string. $string is tested and printed if that test is true. The 'split' splits each string at the newline and discard that character. : Do I even need the foreach statement? I'm not sure. has the Orig strings in it is not a precise statement for a computer programmer. Question: How did this list of strings get into a single scalar? HTH, Charles K. Clarkson -- Mobile Homes Specialist 254 968-8328
RE: regex problem
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: : it is a system app call that populates the : $EDM_nonactive_tapelist I am not sure what you mean : I'm not sure. has the Orig strings in it is not a : precise statement for a computer programmer. I meant that has the Orig strings in it does not tell us how the strings are represented. It does not precisely define how the data is structured. That statement does not accurately describe the data. Here are two examples of strings listed in a scalar. In both cases I could describe each of these examples as a scalar variable with strings in it. $baz = [ [ 'foo' ], [ 'bar' ], ]; $baz = foo\nbar\n; As computer programmers, we have to describe data precisely. If you are uncertain how to describe a structure try printing it with DATA::Dumper. : the foreach with the split did work! Great. I'm glad I could help. HTH, Charles K. Clarkson -- Mobile Homes Specialist 254 968-8328 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Yet Another Regex Problem
Hi guyz, this regex are goin' to drive me crazy! My problem is: I have to find URLs in a text file (so, cannot use LWP or HTML parser) I've tried with something like /(http.:\/\/.*\s)/ willing to find anything starting with http/https with //: and catching everything up to a space or newline. It works in some cases but it catch the widest possible matching, so if I have something like try to click here http://www.yahoo.com or there http://www.google.com; the result for $1 is: http://www.yahoo.com or there http://www.google.com; How can I get simply http://www.yahoo.com; and then http://www.google.com;? thanx very much...you'r saving a man Francesco __ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Yet Another Regex Problem
On Jun 8, Francesco del Vecchio said: I have to find URLs in a text file (so, cannot use LWP or HTML parser) I'm curious why you can't use a module to extract URLs, but I'll continue anyway. /(http.:\/\/.*\s)/ That regex is broken in a few ways. First, it does NOT match 'http:', it only matches 'http_:', where there is some character between the p and the colon. Second, the .* in it is greedy (it matches as much as it can). Third, it requires your URL to be followed by a space, which won't always be the case. try to click here http://www.yahoo.com or there http://www.google.com; I would suggest trying: @urls = $string =~ m{(https?://\S+)}g; Using \S+ makes it match one or more non-whitespace characters. The only problem with this is that if there happens to be punctuation after the URL, it'll get included. An example is this: Go to http://www.yahoo.com, and you'll see what I mean. That will match `http://www.yahoo.com,' (including the comma). -- Jeff japhy Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/ RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/ CPAN ID: PINYAN[Need a programmer? If you like my work, let me know.] stu what does y/// stand for? tenderpuss why, yansliterate of course. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Yet Another Regex Problem
CHange your regex to /http(s)*:\/\/.*?\s/ To see the docs perldoc perlre ... look for greedy HTH Ram On Tue, 2004-06-08 at 16:15, Francesco del Vecchio wrote: Hi guyz, this regex are goin' to drive me crazy! My problem is: I have to find URLs in a text file (so, cannot use LWP or HTML parser) I've tried with something like /(http.:\/\/.*\s)/ willing to find anything starting with http/https with //: and catching everything up to a space or newline. It works in some cases but it catch the widest possible matching, so if I have something like try to click here http://www.yahoo.com or there http://www.google.com; the result for $1 is: http://www.yahoo.com or there http://www.google.com; How can I get simply http://www.yahoo.com; and then http://www.google.com;? thanx very much...you'r saving a man Francesco __ Do you Yahoo!? Friends. Fun. Try the all-new Yahoo! Messenger. http://messenger.yahoo.com/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Substitution/Regex problem
On Thursday 29 April 2004 10:31, Owen wrote: I would like to replace all instances of @non_space_characters[non_space_characters] with $non_space_characters[non_space_characters] The program below gets the first one only. How do I get the others? TIA Owen --- #!/usr/bin/perl -w use strict; my $line; while (DATA){ $line=$_; #$line=~s/(@)(\S+)(\[\S+\])/\$$2$3/g; $line=~s/(@)(\S+\[\S+\])/\$$2/g; print $line\n; } __DATA__ @[EMAIL PROTECTED]@banana[4]; don't be greedy ;-) s/\@(\S+?\[\S+?\])/\$$1/g; -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: Substitution/Regex problem
Owen wrote: I would like to replace all instances of @non_space_characters[non_space_characters] with $non_space_characters[non_space_characters] The program below gets the first one only. How do I get the others? --- #!/usr/bin/perl -w use strict; my $line; while (DATA){ $line=$_; Why not just: while ( my $line = DATA ) { #$line=~s/(@)(\S+)(\[\S+\])/\$$2$3/g; $line=~s/(@)(\S+\[\S+\])/\$$2/g; +, * and ? are greedy so they will match the longest string that they can. Your complete line up to the newline will be matched by \S+ so you want to be more selective in what you will match. Since user defined variables must consist alpha-numeric and the _ characters you can use \w+ instead. Also, why are you capturing the @ into $1? $line =~ s/@(\w+\[[^]]+])/\$$1/g; print $line\n; } __DATA__ @[EMAIL PROTECTED]@banana[4]; John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Regex problem
I have a directory of files that I want to move to another directory. (eg. ALLY20030111W.eps TEST W20030122 HELP WANTED20030901WW.eps GIRL WATCH BIRD 20030101 etc..) I want to be able to parse the filename and replace the date portion with any date (eg $1=ALLY $2=20030111 $3=W $4=.eps) Then I want to make $2=20030925 and if $3 is empty then I assign .eps to $3 or if $4 is empty then assign .eps How do I do this? #!/usr/bin/perl # move_file.plx use warnings; use strict; $source = /path/to/source/; $destination = /path/to/destination/; $query = ([A-Za-z]+)(\s*?)([0-9]*)(\s*?)([A-Za-z]*)([eps]) opendir DH, $source or die Couldn't open the current directory: $source; while ($_ = readdir(DH)) { next if $_ eq . or $_ eq ..; if (/$query/) { print Copying $_ ...\n; rename $source$_, $destination$_; print file copied successfully.\n; } } What's wrong with my code. Am I overlooking something? -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
REGEX PROBLEM
hi, i have the follwing strings: /tmp/test/.test.txt /tmp/test/hallo.txt /tmp/test/xyz/abc.txt /var/log/ksy/123.log now i need a regex that matches all lines but the one that contains a filename starting with a point. like .test.txt. how can i do that? this is what i have: '\.(?!tgz)[^.]*$' this matches everything, but tgz at the end of a line, so '(?!\.)[^.]*$' should do the job, but it doesnt:( THANK YOU:) -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: matching file names starting with a dot (was: REGEX PROBLEM)
magelor wrote at Fri, 25 Jul 2003 11:09:03 +0200: /tmp/test/.test.txt /tmp/test/hallo.txt /tmp/test/xyz/abc.txt /var/log/ksy/123.log now i need a regex that matches all lines but the one that contains a filename starting with a point. like .test.txt. how can i do that? this is what i have: '\.(?!tgz)[^.]*$' this matches everything, but tgz at the end of a line, so '(?!\.)[^.]*$' should do the job, but it doesnt:( If you only want to guarantuee that the base filename doesn't start with a dot, you might try something like m!/(?!\.)\w+\.\w+$! # or m!/[^.]+\.\w+$! # or m!/[^/.]+$! The first both checks wether there is a *.* file (with no leading \.) after the last slash. The second checks whether the string ends on a sequence of no slashes and no dots what also does what you might want. However, in general I would propose to use a module to gain an easy understandable and robust solution: use File::Basename; # available in CPAN sub is_file_starting_with_dot { return basename($_[0]) =~ /^\./; } foreach (/tmp/test/.test.txt, /tmp/test/hallo.txt, /tmp/test/xyz/abc.txt, /var/log/ksy/123.log, ) { print $_, is_file_starting_with_dot($_) ? starts with dot : :-) ; print \n; } Best Wishes, Janek PS: It's better not to shout to the reader with an uppercase subject that isn't very detailed. I would have ignored you if it wouldn't be friday :-) -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: REGEX PROBLEM
On Friday, Jul 25, 2003, at 18:09 Asia/Tokyo, [EMAIL PROTECTED] wrote: /tmp/test/.test.txt /tmp/test/hallo.txt /tmp/test/xyz/abc.txt /var/log/ksy/123.log now i need a regex that matches all lines but the one that contains a filename starting with a point. like .test.txt. how can i do that? this is what i have: '\.(?!tgz)[^.]*$' this matches everything, but tgz at the end of a line, so '(?!\.)[^.]*$' should do the job, but it doesnt:( because your expression matches, for example, just 'txt' too. Try /\/[^.\/][^\/]+$/g; or /^.+\/[^.\/][^\/]+$/g; Kino -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: REGEX PROBLEM
[EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] hi, i have the follwing strings: /tmp/test/.test.txt /tmp/test/hallo.txt /tmp/test/xyz/abc.txt /var/log/ksy/123.log now i need a regex that matches all lines but the one that contains a filename starting with a point. like .test.txt. how can i do that? this is what i have: '\.(?!tgz)[^.]*$' this matches everything, but tgz at the end of a line, so '(?!\.)[^.]*$' should do the job, but it doesnt:( The following will help. It first generatest the file's basename in $1 by capturing the string of all trailing characters which aren't '/', and then checks to ensure that that basename doesn't start with a dot.. HTH, Rob while (DATA) { chomp; if ( m([^/]*)$ and $1 =~ /^[^.]/ ) { print $_, \n; } } __DATA__ /tmp/test/.test.txt /tmp/test/hallo.txt /tmp/test/xyz/abc.txt /var/log/ksy/123.log OUTPUT /tmp/test/hallo.txt /tmp/test/xyz/abc.txt /var/log/ksy/123.log -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]