Re: regex help - only one value returned
In your original example: print "match1='$1' '$2'\n" if ($T=~/^((mr|mrs|miss|dr|prof|sir) .{5,}?)\n/smi); print "match2='$1' '$2'\n" if ($T=~/^(mr|mrs|miss|dr|prof|sir .{5,}?)\n/smi); the interior parentheses in example one terminates the alternation, so the last string is ’sir’. In example two, the alternation is not terminated until the first ‘)', so the last string is ’sir .{5,}?’. followed in the regular expression by the “\n” character. Since in $T ‘miss’ is not followed by an \n, the match fails. Vlado has explained how to group and terminate the alternation without capturing the match result. > On Dec 2, 2020, at 6:08 AM, Gary Stainburn > wrote: > > On 02/12/2020 13:56, Vlado Keselj wrote: >> Well, it seems that the first one is what you want, but you just need to >> use $1 and ignore $2. >> >> You do need parentheses in '(mr|mrs|miss|dr|prof|sir)' but if you do not >> want for them to be captured in $2, you can use: >> '(?:mr|mrs|miss|dr|prof|sir)'. For example: >> >> print "match3='$1' '$2'\n" if >> ($T=~/^((?:mr|mrs|miss|dr|prof|sir) .{5,}?)\n/smi); >> >> would give output: >> >> match3='Miss Jayne Doe' '' > Perfect, thank you. > > I can't ignore $2 as it's in a loop with other regex that genuinely returns > multiple matches. The amendment to the REGEX worked perfectly. It is always best to save the results of a match with capturing in another variable. The capturing variables $1, $2, etc. are not reassigned if a match fails, so if you use them after a failed match, they will be the values left over from a previous match. So do this: my $salutation = $1; my $name = $2; If you don’t want a possible undefined value, so this instead: my $name = $2 || ''; Jim Gibson j...@gibson.org -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex help - only one value returned
On 02/12/2020 13:56, Vlado Keselj wrote: Well, it seems that the first one is what you want, but you just need to use $1 and ignore $2. You do need parentheses in '(mr|mrs|miss|dr|prof|sir)' but if you do not want for them to be captured in $2, you can use: '(?:mr|mrs|miss|dr|prof|sir)'. For example: print "match3='$1' '$2'\n" if ($T=~/^((?:mr|mrs|miss|dr|prof|sir) .{5,}?)\n/smi); would give output: match3='Miss Jayne Doe' '' Perfect, thank you. I can't ignore $2 as it's in a loop with other regex that genuinely returns multiple matches. The amendment to the REGEX worked perfectly. Gary -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex help - only one value returned
Well, it seems that the first one is what you want, but you just need to use $1 and ignore $2. You do need parentheses in '(mr|mrs|miss|dr|prof|sir)' but if you do not want for them to be captured in $2, you can use: '(?:mr|mrs|miss|dr|prof|sir)'. For example: print "match3='$1' '$2'\n" if ($T=~/^((?:mr|mrs|miss|dr|prof|sir) .{5,}?)\n/smi); would give output: match3='Miss Jayne Doe' '' On Wed, 2 Dec 2020, Gary Stainburn wrote: > I have an array of regex expressions that I apply to text returned from > tesseract. > > Each match that I get then gets stored for future processing. However, I'm > struggling with one regex. > > The problem is that: > > 1) with brackets round the titles it returns two matches. > 2) without brackets, it returns nothing. > > Can anyone point me at the correct syntax please. > > Gary > > [root@dev dev]# ./t > match1='Miss Jayne Doe' 'Miss' > [root@dev dev]# cat t > #!/usr/bin/perl > > use strict; > use warnings; > > my $T=< Customer name and address > Miss Jayne Doe > 19 Their Street > Somewehere > In Yorkshire > IN1 3YY > EOF > > print "match1='$1' '$2'\n" if ($T=~/^((mr|mrs|miss|dr|prof|sir) > .{5,}?)\n/smi); > print "match2='$1' '$2'\n" if ($T=~/^(mr|mrs|miss|dr|prof|sir .{5,}?)\n/smi); > [root@dev dev]# > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex help needed
Punit Jain, This is not the optimized code but you can refactor it. This works for the given scenario, no matter the order of input data. Hope it helps to some extent. [code] my $var = ''; my @args = (); my %hash; while (DATA) { chomp; my ($var,$arg) = split /=/,$_,2; if($var eq '{') { @args = (); #Reset if we encounter '{' } my @arg1 = split /,/,$arg if defined $arg; if(scalar @arg1 scalar @args) { $hash{$var} = $arg unless($var eq '{' || $var eq '}'); @args = @arg1; } } foreach my $k (sort keys %hash) { print $k = $hash{$k}\n; } __DATA__ { test = (test123); test = (test123,abc,xyz); test = (test123,abc); } { test1 = (passfile,pasfile1,user); test1 = (passfile); test1 = (passfile,pasfile1); } { test2 = (temp); test2 = (temp,temp1); test2 = (temp,temp1,username); } { test3 = (betty,betty1,jack); test3 = (betty,betty1); test3 = (betty); } [/code] [output] test = (test123,abc,xyz); test1 = (passfile,pasfile1,user); test2 = (temp,temp1,username); test3 = (betty,betty1,jack); [/output] best, Shaji --- Your talent is God's gift to you. What you do with it is your gift back to God. --- From: punit jain contactpunitj...@gmail.com To: beginners@perl.org beginners@perl.org Sent: Tuesday, 8 January 2013 5:58 PM Subject: Regex help needed Hi , I have a file as below : - { test = (test123); test = (test123,abc); test = (test123,abc,xyz); } { test1 = (passfile); test1 = (passfile,pasfile1); test1 = (passfile,pasfile1,user); } and so on The requirement is to have the file parsing so that final output is :- test = (test123,abc,xyz); test1 = (passfile,pasfile1,user); So basically only pick the lines with maximum number of options for each type. Regards.
Re: Regex help needed
On 2013-01-08 13:28, punit jain wrote: { test = (test123); test = (test123,abc); test = (test123,abc,xyz); } { test1 = (passfile); test1 = (passfile,pasfile1); test1 = (passfile,pasfile1,user); } and so on The requirement is to have the file parsing so that final output is :- test = (test123,abc,xyz); test1 = (passfile,pasfile1,user); So basically only pick the lines with maximum number of options for each type. Or just print the last long line: echo '{ test = (test123); test = (test123,abc); test = (test123,abc,xyz); } { test1 = (passfile); test1 = (passfile,pasfile1); test1 = (passfile,pasfile1,user); } ' |perl -wne'$o=$n||0;$p=$_,next if($n=length)$o;$n=3;print$p' test = (test123,abc,xyz); test1 = (passfile,pasfile1,user); Which preserves order too. :) -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex help needed
On Jan 8, 2013, at 4:28 AM, punit jain wrote: Hi , I have a file as below : - { test = (test123); test = (test123,abc); test = (test123,abc,xyz); } { test1 = (passfile); test1 = (passfile,pasfile1); test1 = (passfile,pasfile1,user); } and so on The requirement is to have the file parsing so that final output is :- test = (test123,abc,xyz); test1 = (passfile,pasfile1,user); So basically only pick the lines with maximum number of options for each type. The easiest solution I can think of would be to extract the first token on each line, use that token as a hash key, count the number of commas in each line, and save the line in the hash with the largest number of commas for each key. This will not work if your strings have commas. In that case, you might want to consider using a parsing module, such as Text::CSV, that will correctly handle your input data. You can use Text::CSV to split your input lines into fields and count the number of fields. However, you will first have to extract the quoted strings from the surrounding parentheses. You can use the Text::Balanced module to do that. Both Text::CSV and Text::Balanced are available at CPAN (http;//search.cpan.org). The best way for you to learn programming will be to attempt writing a program to accomplish your task, then post your program if you have trouble getting it to do what you want. Good luck. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex help needed
Hi punit jain, Please check my comments below. On Tue, Jan 8, 2013 at 1:28 PM, punit jain contactpunitj...@gmail.comwrote: Hi , I have a file as below : - { test = (test123); test = (test123,abc); test = (test123,abc,xyz); } { test1 = (passfile); test1 = (passfile,pasfile1); test1 = (passfile,pasfile1,user); } and so on The requirement is to have the file parsing so that final output is :- test = (test123,abc,xyz); test1 = (passfile,pasfile1,user); So basically only pick the lines with maximum number of options for each type. Regards. I basically agreed with Jim on this: Jim to learn programming will be to attempt writing a program to accomplish your task, Jim then post your program if you have trouble getting it to do what you want. However, if I may suggest using hash, if the lines with the maximum number of options for each type *is the last one in each case*. Since, *hash will only permit only one key*. So, splitting each line on =, one can take key and value for hash. So, based on the data presented, one can write like so: use warnings; use strict; my %collection_hash; while (DATA) { chomp; if (/=/) { my ( $key, $value ) = split /=/, $_, 2; $collection_hash{$key} = $value; } } print $_, ' = ', $collection_hash{$_}, $/ for sort keys %collection_hash; __DATA__ { test = (test123); test = (test123,abc); test = (test123,abc,xyz); } { test1 = (passfile); test1 = (passfile,pasfile1); test1 = (passfile,pasfile1,user); } *OUTPUT:* test = (test123,abc,xyz); test1 = (passfile,pasfile1,user); Please, *NOTE* that this will only work as you want if the last line in each case has the maximum options, this is what the data you showed here presented. -- Tim
Re: Regex help
On Sat, Dec 22, 2012 at 04:45:21PM +0530, punit jain wrote: Hi, I have a file like below : - BEGIN:VCARD VERSION:2.1 EMAIL:te...@test.com FN:test1 REV:20101116T030833Z UID:644938456.1419. END:VCARD From (S___-0003) Tue Nov 16 03:10:15 2010 content-class: urn:content-classes:person Date: Tue, 16 Nov 2010 11:10:15 +0800 Subject: test Message-ID: 644938507.1420 MIME-Version: 1.0 Content-Type: text/x-vcard; charset=utf-8 BEGIN:VCARD VERSION:2.1 EMAIL:te...@test.com FN:test2 REV:20101116T031015Z UID:644938507.1420 END:VCARD My requirement is to get all text between BEGIN:VCARD and END:VCARD and all the instances. So o/p should be :- BEGIN:VCARD VERSION:2.1 EMAIL:te...@test.com FN:test1 REV:20101116T030833Z UID:644938456.1419. END:VCARD BEGIN:VCARD VERSION:2.1 EMAIL:te...@test.com FN:test2 REV:20101116T031015Z UID:644938507.1420 END:VCARD I am using below regex :- my $fh = IO::File-new($file, r); my $script = do { local $/; $fh }; close $fh; if ( $script =~ m/ (^BEGIN:VCARD\s*(.*) ^END:VCARD\s+)/sgmix ){ print OUTFILE $1.\n; } However it just prints 1st instance and not all. It also prints the text between the two instances, right? Any suggestions ? You need a non greedy match .*? instead of the greedy match .* that you are using. Then you'll need to use while instead of if. Or perhaps you'd prefer: $ perl -ne 'print if /BEGIN:VCARD/ .. /END:VCARD/' in out or $ perl -n00e 'print if /^BEGIN:VCARD/' in out See perldoc perlrun for the switches and Range Operators from perdoc perlop for .. -- Paul Johnson - p...@pjcj.net http://www.pjcj.net -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex help
On Sat, 22 Dec 2012 16:45:21 +0530 punit jain contactpunitj...@gmail.com wrote: Hi, I have a file like below : - [snipped example - vcards with mail headers etc in between] My requirement is to get all text between BEGIN:VCARD and END:VCARD and all the instances. So o/p should be :- [...] I am using below regex :- [...] Any suggestions ? You've already had a reply indicating how to solve the problem you were having with regexes, so I won't touch on that. What I will advise, is that for any task you're trying to accomplish, there's a pretty good chance someone has already solved that and made code available on CPAN that will help you - so always check CPAN first, to avoid unnecessarily reinventing the wheel each time (unless you're doing so solely for a learning experience, of course). In this case, parsing vcards is likely a common task - a quick look on CPAN turns up Text::vCard::Addressbook: https://metacpan.org/module/Text::vCard::Addressbook From the synopsis: use Text::vCard::Addressbook; my $address_book = Text::vCard::Addressbook-new( { 'source_file' = '/path/to/address.vcf', } ); foreach my $vcard ( $address_book-vcards() ) { print Got card for . $vcard-fullname() . \n; } It will ignore the non-vcard content in the example you provided, and just provide you easy access to the data from each vcard. That's a much nicer approach than extracting it yourself with regexes. Cheers Dave P -- David Precious (bigpresh) dav...@preshweb.co.uk http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook www.preshweb.co.uk/cpanwww.preshweb.co.uk/github -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex help
On 22/12/2012 11:15, punit jain wrote: Hi, I have a file like below : - BEGIN:VCARD VERSION:2.1 EMAIL:te...@test.com FN:test1 REV:20101116T030833Z UID:644938456.1419. END:VCARD From (S___-0003) Tue Nov 16 03:10:15 2010 content-class: urn:content-classes:person Date: Tue, 16 Nov 2010 11:10:15 +0800 Subject: test Message-ID: 644938507.1420 MIME-Version: 1.0 Content-Type: text/x-vcard; charset=utf-8 BEGIN:VCARD VERSION:2.1 EMAIL:te...@test.com FN:test2 REV:20101116T031015Z UID:644938507.1420 END:VCARD My requirement is to get all text between BEGIN:VCARD and END:VCARD and all the instances. So o/p should be :- BEGIN:VCARD VERSION:2.1 EMAIL:te...@test.com FN:test1 REV:20101116T030833Z UID:644938456.1419. END:VCARD BEGIN:VCARD VERSION:2.1 EMAIL:te...@test.com FN:test2 REV:20101116T031015Z UID:644938507.1420 END:VCARD I am using below regex :- my $fh = IO::File-new($file, r); my $script = do { local $/; $fh }; close $fh; if ( $script =~ m/ (^BEGIN:VCARD\s*(.*) ^END:VCARD\s+)/sgmix ){ print OUTFILE $1.\n; } However it just prints 1st instance and not all. Any suggestions ? This is very simply done with Perl's range operator. See the program below. Rob use strict; use warnings; open my $fh, '', 'vcard.txt' or die $!; while ($fh) { print if /^BEGIN:VCARD/ .. /^END:VCARD/; } **output** BEGIN:VCARD VERSION:2.1 EMAIL:te...@test.com FN:test1 REV:20101116T030833Z UID:644938456.1419. END:VCARD BEGIN:VCARD VERSION:2.1 EMAIL:te...@test.com FN:test2 REV:20101116T031015Z UID:644938507.1420 END:VCARD -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex help
On Wed, Oct 19, 2011 at 1:10 AM, Leo Susanto leosusa...@gmail.com wrote: use strict; my %CELL; my %CELL_TYPE_COUNT; my $timestamp; my $hour; while (my $line = DATA) { if ($line =~ m|\d{1,2}/\d{1,2}/\d{2} ((\d{1,2}):\d{1,2}:\d{1,2})|) { #10/17/11 18:25:20 #578030 $timestamp = $1; $hour = $2; } if ($line =~ /CELL\s+(\d+)\s+(.+?),.+?HEH/) { # take CELL number into $1 and the information after the number (and before the first comma) into $2 if ((17 = $hour)($hour =21)) { $CELL{$hour}{$1}{$2}++; $CELL_TYPE_COUNT{$2}++; } } } Would someone help me understand what this block of code is doing after the if condition? Is it utilizing references and counting the occurences of the keys? I am having some trouble digesting it. Thanks for the clarification, Chris
Re: regex help
On 10/28/11 Fri Oct 28, 2011 2:15 PM, Chris Stinemetz chrisstinem...@gmail.com scribbled: On Wed, Oct 19, 2011 at 1:10 AM, Leo Susanto leosusa...@gmail.com wrote: use strict; my %CELL; my %CELL_TYPE_COUNT; my $timestamp; my $hour; while (my $line = DATA) { if ($line =~ m|\d{1,2}/\d{1,2}/\d{2} ((\d{1,2}):\d{1,2}:\d{1,2})|) { #10/17/11 18:25:20 #578030 $timestamp = $1; $hour = $2; } if ($line =~ /CELL\s+(\d+)\s+(.+?),.+?HEH/) { # take CELL number into $1 and the information after the number (and before the first comma) into $2 if ((17 = $hour)($hour =21)) { $CELL{$hour}{$1}{$2}++; $CELL_TYPE_COUNT{$2}++; } } } Would someone help me understand what this block of code is doing after the if condition? Is it utilizing references and counting the occurences of the keys? I am having some trouble digesting it. The best way to figure what it is doing is to print out the values of $hour, $1, and $2, $CELL{$hour}{$1}{$2}, and $CELL_TYPE_COUNT{$2} before and after the if statement block. You should be able to combine those two regular expression being applied to $line, but I would need to see typical data lines to make sure and how to do that. It looks like it is counting cells and cell types, judging from the names of the variables and the comments. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex help
use strict; my %CELL; my %CELL_TYPE_COUNT; my $timestamp; my $hour; while (my $line = DATA) { if ($line =~ m|\d{1,2}/\d{1,2}/\d{2} ((\d{1,2}):\d{1,2}:\d{1,2})|) { #10/17/11 18:25:20 #578030 $timestamp = $1; $hour = $2; } if ($line =~ /CELL\s+(\d+)\s+(.+?),.+?HEH/) { # take CELL number into $1 and the information after the number (and before the first comma) into $2 if ((17 = $hour)($hour =21)) { $CELL{$hour}{$1}{$2}++; $CELL_TYPE_COUNT{$2}++; } } } # header print HOUR, CELL,.join(, ,sort keys %CELL_TYPE_COUNT).\n; # body foreach my $hour (sort keys %CELL) { # you can use map function, but it never sits well on my brain foreach my $cellNo (sort keys %{$CELL{$hour}}) { print $hour, $cellNo; foreach my $info (sort keys %CELL_TYPE_COUNT) { if (exists $CELL{$hour}{$cellNo}{$info}) { print , $CELL{$hour}{$cellNo}{$info}; } else { print , 0; } } print \n; } } __DATA__ 10/17/11 10:25:20 #578030 25 REPT:CELL 221 CDM 2, CRC, HEH SUPPRESSED MSGS: 0 ERROR TYPE: ONEBTS MODULAR CELL ERROR SET: MLG BANDWIDTH CHANGE MLG 1 BANDWIDTH = 1536 00 00 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10/17/11 18:25:20 #578031 25 REPT:CELL 221 CDM 2, CRC, HEH SUPPRESSED MSGS: 0 ERROR TYPE: ONEBTS MODULAR CELL ERROR SET: DS1-MLG ASSOCIATION CHANGE MLG 1 DS1 1,2 00 00 00 00 00 00 00 00 03 00 00 00 01 00 05 05 #my own test data 10/17/11 18:25:20 #578031 25 REPT:CELL 220 CDM 1, CRC, HEH 10/17/11 18:25:20 #578031 25 REPT:CELL 220 CDM 1, CRC, HEH 10/17/11 19:25:20 #578031 25 REPT:CELL 220 CDM 1, CRC, HEH On Tue, Oct 18, 2011 at 1:16 AM, Chris Stinemetz chrisstinem...@gmail.com wrote: On Mon, Oct 17, 2011 at 10:57 PM, Leo Susanto leosusa...@gmail.com wrote: From looking at the regex if ($line =~ /17|18|19|20|21+:(\d+):(\d+)+\n+\n+CELL\s+(\d+)\s+(.+?),.+?HEH/){ against the data 10/17/11 18:25:20 #578030 25 REPT:CELL 221 CDM 2, CRC, HEH SUPPRESSED MSGS: 0 ERROR TYPE: ONEBTS MODULAR CELL ERROR SET: MLG BANDWIDTH CHANGE MLG 1 BANDWIDTH = 1536 I would assume $1 and $2 wouldn't match to anything plus $5 doesn't exist. Could you please let us know which part of the data you want to extract? Fill in the blanks $1= $2= $3= $4= $5= Thanks everyone. I hope this clarifies what I am trying to match. For example with this input: 10/17/11 18:25:20 #578030 25 REPT:CELL 221 CDM 2, CRC, HEH SUPPRESSED MSGS: 0 ERROR TYPE: ONEBTS MODULAR CELL ERROR SET: MLG BANDWIDTH CHANGE MLG 1 BANDWIDTH = 1536 $1= Match the time stamp Hour:Min:Sec only if the hour is = 17 and hour = 21 $2= capture CELL number $3= capture the information after the CELL number (and before the first comma) Thank you, Chris -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex help
Brian Fraser wrote: On Tue, Oct 18, 2011 at 12:32 AM, Chris Stinemetz chrisstinem...@gmail.comwrote: /17|18|19|20|21+:(\d+):(\d+)+\n+\n+CELL\s+(\d+)\s+(.+?),.+?HEH/ Spot the issue: / 17#Or | 18#Or | 19#Or | 20#Or | 21+:(\d+):(\d+)+\n+\n+CELL\s+(\d+)\s+(.+?),.+?HEH /x For anything but 21, the regex is only two numbers! You need to enclose the alternatives in () or (?:), depending on whenever you want to capture them or not. That aside, please be very mindful that \d and . are both code smells. The former will match much, much more than just [0-9] -- grab the unichars[0] program from Unicode::Tussle[1] if you want to see for yourself. Either use the /a switch (or the more localized form (?a:), bot available in newer Perls), or [0-9], or \p{PosixDigit}, or (your favorite way here. TIMTOWTDI applies). The dot is also problematic. You aren't using the /s switch, so it actually matches [^\n]. Correct. Is that what you want? Are you certain that no one is going to come and, after reading Perl Best Practices, will try to helpfully but wrongly add the /smx flags and screw up your regex? It doesn't really matter because the regular expression is matched against a line (via readline) and as such will only contain one newline and that newline will be at the end of the line so it will match the same with or without the /s option. Also, as the regular expression does not contain either the ^ anchor or the $ anchor it will match the same with or without the /m option. John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex help
Chris Stinemetz wrote: Hello, Hello, [*SNIP*] #!/usr/bin/perl use warnings; use strict; use POSIX; # my $filepath = sprintf(/omp/omp-data/logs/OMPROP1/%s.APX,strftime(%y%m%d%H,localtime)); my $filepath = (/tmp/110923.APX); # for testing my $runTime = sprintf(/home/cstinemetz/programs/%s.txt,strftime(%Y-%m-%d-%H:%M,localtime)); my $fileDate = strftime(%y%m%d%H%,localtime); open my $fh, '', $filepath or die ERROR opening $filepath: $!; open my $out, '', $runTime or die ERROR opening $runTime: $!; my %date; my %cell; my %heh_type_count; while (my $line =$fh) { if ($line =~ /17|18|19|20|21+:(\d+):(\d+)+\n+\n+CELL\s+(\d+)\s+(.+?),.+?HEH/){ As you haven't changed the Input Record Separator the code my $line =$fh will read one line from the file and that line will have only one newline at the end of the line so /\n+\n+/ in the middle of your regular expression will never match. John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex help
On Mon, Oct 17, 2011 at 10:57 PM, Leo Susanto leosusa...@gmail.com wrote: From looking at the regex if ($line =~ /17|18|19|20|21+:(\d+):(\d+)+\n+\n+CELL\s+(\d+)\s+(.+?),.+?HEH/){ against the data 10/17/11 18:25:20 #578030 25 REPT:CELL 221 CDM 2, CRC, HEH SUPPRESSED MSGS: 0 ERROR TYPE: ONEBTS MODULAR CELL ERROR SET: MLG BANDWIDTH CHANGE MLG 1 BANDWIDTH = 1536 I would assume $1 and $2 wouldn't match to anything plus $5 doesn't exist. Could you please let us know which part of the data you want to extract? Fill in the blanks $1= $2= $3= $4= $5= Thanks everyone. I hope this clarifies what I am trying to match. For example with this input: 10/17/11 18:25:20 #578030 25 REPT:CELL 221 CDM 2, CRC, HEH SUPPRESSED MSGS: 0 ERROR TYPE: ONEBTS MODULAR CELL ERROR SET: MLG BANDWIDTH CHANGE MLG 1 BANDWIDTH = 1536 $1= Match the time stamp Hour:Min:Sec only if the hour is = 17 and hour = 21 $2= capture CELL number $3= capture the information after the CELL number (and before the first comma) Thank you, Chris -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex help
Maybe this'll be helpful. ) my $time_rx = qr/(?timestamp (?hour \d{2} ) (?: :\d{2} ){2} ) /x; my $cell_record_rx = qr/CELL \s+ (?cell_number \d+) \s+ (?cell_info [^,]+) /x; my $records_ref; my $record_ts; while () { if ($record_ts) { # looking for record data of this particular timestamp if (/$cell_record_rx/) { ++$records_ref-{$record_ts}{ $+{cell_number} }{ $+{cell_info} }; undef $record_ts; } } else { #scanning for next valid record if (/$time_rx/ $+{hour} = 17 $+{hour} = 21) { $record_ts = $+{timestamp}; } } } -- iD 2011/10/18 Chris Stinemetz chrisstinem...@gmail.com On Mon, Oct 17, 2011 at 10:57 PM, Leo Susanto leosusa...@gmail.com wrote: From looking at the regex if ($line =~ /17|18|19|20|21+:(\d+):(\d+)+\n+\n+CELL\s+(\d+)\s+(.+?),.+?HEH/){ against the data 10/17/11 18:25:20 #578030 25 REPT:CELL 221 CDM 2, CRC, HEH SUPPRESSED MSGS: 0 ERROR TYPE: ONEBTS MODULAR CELL ERROR SET: MLG BANDWIDTH CHANGE MLG 1 BANDWIDTH = 1536 I would assume $1 and $2 wouldn't match to anything plus $5 doesn't exist. Could you please let us know which part of the data you want to extract? Fill in the blanks $1= $2= $3= $4= $5= Thanks everyone. I hope this clarifies what I am trying to match. For example with this input: 10/17/11 18:25:20 #578030 25 REPT:CELL 221 CDM 2, CRC, HEH SUPPRESSED MSGS: 0 ERROR TYPE: ONEBTS MODULAR CELL ERROR SET: MLG BANDWIDTH CHANGE MLG 1 BANDWIDTH = 1536 $1= Match the time stamp Hour:Min:Sec only if the hour is = 17 and hour = 21 $2= capture CELL number $3= capture the information after the CELL number (and before the first comma) Thank you, Chris -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex help
From looking at the regex if ($line =~ /17|18|19|20|21+:(\d+):(\d+)+\n+\n+CELL\s+(\d+)\s+(.+?),.+?HEH/){ against the data 10/17/11 18:25:20 #578030 25 REPT:CELL 221 CDM 2, CRC, HEH SUPPRESSED MSGS: 0 ERROR TYPE: ONEBTS MODULAR CELL ERROR SET: MLG BANDWIDTH CHANGE MLG 1 BANDWIDTH = 1536 I would assume $1 and $2 wouldn't match to anything plus $5 doesn't exist. Could you please let us know which part of the data you want to extract? Fill in the blanks $1= $2= $3= $4= $5= On Mon, Oct 17, 2011 at 8:32 PM, Chris Stinemetz chrisstinem...@gmail.com wrote: Hello, I am getting the following error when I am trying to use regex to match a pattern and then access the memory variables: Any insight as to what I am doing wrong is greatly appreciated. Thank you, Chris Use of uninitialized value $1 in hash element at ./heh.pl line 22, $fh line 1211. Use of uninitialized value $4 in hash element at ./heh.pl line 22, $fh line 1211. Use of uninitialized value $5 in hash element at ./heh.pl line 22, $fh line 1211. An example of what I am trying to match is: 10/17/11 18:25:20 #578030 25 REPT:CELL 221 CDM 2, CRC, HEH SUPPRESSED MSGS: 0 ERROR TYPE: ONEBTS MODULAR CELL ERROR SET: MLG BANDWIDTH CHANGE MLG 1 BANDWIDTH = 1536 00 00 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10/17/11 18:25:20 #578031 25 REPT:CELL 221 CDM 2, CRC, HEH SUPPRESSED MSGS: 0 ERROR TYPE: ONEBTS MODULAR CELL ERROR SET: DS1-MLG ASSOCIATION CHANGE MLG 1 DS1 1,2 00 00 00 00 00 00 00 00 03 00 00 00 01 00 05 05 My program: #!/usr/bin/perl use warnings; use strict; use POSIX; # my $filepath = sprintf(/omp/omp-data/logs/OMPROP1/%s.APX,strftime(%y%m%d%H,localtime)); my $filepath = (/tmp/110923.APX); # for testing my $runTime = sprintf(/home/cstinemetz/programs/%s.txt,strftime(%Y-%m-%d-%H:%M,localtime)); my $fileDate = strftime(%y%m%d%H%,localtime); open my $fh, '', $filepath or die ERROR opening $filepath: $!; open my $out, '', $runTime or die ERROR opening $runTime: $!; my %date; my %cell; my %heh_type_count; while (my $line = $fh) { if ($line =~ /17|18|19|20|21+:(\d+):(\d+)+\n+\n+CELL\s+(\d+)\s+(.+?),.+?HEH/){ $cell{$1}{$4}{$5}++; $heh_type_count{$5}++; } } # header print HOUR\t.CELL\t.join(\t,sort keys %heh_type_count).\n; # body foreach my $cellNo (sort {$a = $b} keys %cell) { print $cellNo; foreach my $heh_hits (sort keys %heh_type_count) { if (exists $cell{$cellNo}{$heh_hits}) { print \t $cell{$cellNo}{$heh_hits}; } else { print \t 0; } } print \n; } -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex help
On Tue, Oct 18, 2011 at 12:32 AM, Chris Stinemetz chrisstinem...@gmail.comwrote: /17|18|19|20|21+:(\d+):(\d+)+\n+\n+CELL\s+(\d+)\s+(.+?),.+?HEH/ Spot the issue: / 17#Or | 18#Or | 19#Or | 20#Or | 21+:(\d+):(\d+)+\n+\n+CELL\s+(\d+)\s+(.+?),.+?HEH /x For anything but 21, the regex is only two numbers! You need to enclose the alternatives in () or (?:), depending on whenever you want to capture them or not. That aside, please be very mindful that \d and . are both code smells. The former will match much, much more than just [0-9] -- grab the unichars[0] program from Unicode::Tussle[1] if you want to see for yourself. Either use the /a switch (or the more localized form (?a:), bot available in newer Perls), or [0-9], or \p{PosixDigit}, or (your favorite way here. TIMTOWTDI applies). The dot is also problematic. You aren't using the /s switch, so it actually matches [^\n]. Is that what you want? Are you certain that no one is going to come and, after reading Perl Best Practices, will try to helpfully but wrongly add the /smx flags and screw up your regex? If you -really- want to match anything, use \p{Any}, or \X, and you have to know the difference between the two, otherwise you are doing it wrong. See [2] and [3], though you might want to make a cup of tea and sit somewhere comfortable first, as they aren't easy nor quick reads. But chances are that you don't want that. Which is actually much simpler! If you want to match anything-until-the-next-comma, use [^,]+ (And if you really want [^\n], you could use \N, which is not-a-newline, or even better, \V, which is not-a-vertical-space) [0] https://www.metacpan.org/module/unichars [1] https://metacpan.org/release/Unicode-Tussle [2] http://www.nntp.perl.org/group/perl.perl5.porters/2011/07/msg174287.html [3] http://www.nntp.perl.org/group/perl.perl5.porters/2011/07/msg174338.html
Re: regex help
On Mon, Oct 10, 2011 at 4:56 PM, Chris Stinemetz chrisstinem...@gmail.comwrote: Any help is appreciated. Once I match HEH how can alter the program to print the contents that are in the two lines directly above the match? For example in this case I would like the print results to be: **01 REPT:CELL 983 CDM 1, CCU 1, CE 5, HEHTimestamp: 10/10/11 00:01:18 #!/usr/bin/perl use warnings; use strict; while(my $hehline = DATA) { chomp $hehline; if ($hehline =~ /, HEH/) { print $hehline \n; } } __DATA__ 10/10/11 00:01:17 #984611 A 01 REPT:CELL 833 CP FAILURE, UNANSWERED TERMINATION CDMA TRAFFIC CHANNEL CONFIRMATION FAILURE TRAFFIC CHANNEL FAILURE REASON - ACQUIRE MOBILE FAILURE [2] DCS 1 TG 1723 TM 374 SG 0 ANT 2 CARRIER 4, CHAN UNAVAIL FS-ECP ID 1, SYS ID 4681 DN 3168710330, MIN 3164094259, IMSI UNAVAIL SN ###2ddff3 MEID Xa0###629cc SCM ba ALW CDMA, ASGN CDMA CDM 1, CCU 2, CE 64, MLG 1/MLG_CDM 1 DCS 1/PSU 0/SM 3/BHS 6, ECP ID 1, SYS ID 4681 10/10/11 00:01:18 #984614 **01 REPT:CELL 983 CDM 1, CCU 1, CE 5, HEH SUPPRESSED MSGS: 0 FT PL SECTOR 3 CARRIER 1 (1.9 GHz PCS) FAILURE: OUT OF RANGE PILOT LEVEL: MEASURED = 28.3 dBmEXPECTED = 33.8 dBm SECONDARY UNIT: CDM 1, CBR 3 -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ Couple of ways. You could save the line numbers on the first pass, then read the file again and print the relevant lines; Remember that $. has the current line number. my @lines; while (...) { ... if (/,\s+HEH/) { push @lines, $.; } } Or you could do the same as above, but use Tie::File instead of reading the file twice. Or you could keep saving the previous two lines (this assumes the HEH can't be on the first two lines. If it can, you'll have to modify the proggy accordingly): my ($one, $two) = (scalar DATA, scalar DATA); while (my $hehline = DATA) { ... if (/,\s+HEH/) { say [$one]\n[$two]; } ($one, $two) = ($two, $hehline); } Or, looking at your data, you could read paragraphs instead of line-by-line -- Apparently each chunk is separated by three (four?) newlines, so { local $/ = \n\n\n; while (my $hehline = DATA) { ... # shenanigans here } }
Re: regex help
Chris Stinemetz wrote in message Any help is appreciated. Once I match HEH how can alter the program to print the contents that are in the two lines directly above the match? For example in this case I would like the print results to be: **01 REPT:CELL 983 CDM 1, CCU 1, CE 5, HEHTimestamp: 10/10/11 00:01:18 [snip code and data] I think the following should work. Chris #!/usr/bin/perl use strict; use warnings; my $dt; # date_time while (DATA) { chomp; if (m!^(\d\d/\d\d/\d\d \d\d:\d\d:\d\d)!) { $dt = $1; } elsif (/HEH$/) { print $_ Timestamp $dt\n; } } __DATA__ 10/10/11 00:01:17 #984611 A 01 REPT:CELL 833 CP FAILURE, UNANSWERED TERMINATION CDMA TRAFFIC CHANNEL CONFIRMATION FAILURE TRAFFIC CHANNEL FAILURE REASON - ACQUIRE MOBILE FAILURE [2] DCS 1 TG 1723 TM 374 SG 0 ANT 2 CARRIER 4, CHAN UNAVAIL FS-ECP ID 1, SYS ID 4681 DN 3168710330, MIN 3164094259, IMSI UNAVAIL SN ###2ddff3 MEID Xa0###629cc SCM ba ALW CDMA, ASGN CDMA CDM 1, CCU 2, CE 64, MLG 1/MLG_CDM 1 DCS 1/PSU 0/SM 3/BHS 6, ECP ID 1, SYS ID 4681 10/10/11 00:01:18 #984614 **01 REPT:CELL 983 CDM 1, CCU 1, CE 5, HEH SUPPRESSED MSGS: 0 FT PL SECTOR 3 CARRIER 1 (1.9 GHz PCS) FAILURE: OUT OF RANGE PILOT LEVEL: MEASURED = 28.3 dBmEXPECTED = 33.8 dBm SECONDARY UNIT: CDM 1, CBR 3 -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex help
Chris Stinemetz wrote: Any help is appreciated. It looks like you don't need regex help. Once I match HEH how can alter the program to print the contents that are in the two lines directly above the match? For example in this case I would like the print results to be: **01 REPT:CELL 983 CDM 1, CCU 1, CE 5, HEHTimestamp: 10/10/11 00:01:18 #!/usr/bin/perl use warnings; use strict; while(my $hehline =DATA) { chomp $hehline; if ($hehline =~ /, HEH/) { print $hehline \n; } } You could simplify that by not removing the newline and then adding it back in: while ( my $hehline = DATA ) { if ( $hehline =~ /, HEH/ ) { print $hehline; } } And even simpler by not using the $hehline variable: while ( DATA ) { print if /, HEH/; } But, back to your real problem. I can think of two ways to do it. Number one: if you are sure that the line you want is _always_ two lines above you could use an array to hold the line you need: my @buffer; while ( my $hehline = DATA ) { push @buffer, $hehline; shift @buffer if @buffer 3; if ( $hehline =~ /, HEH/ ) { print $buffer[ 0 ]; } } Number two: better to just capture the line you require and only print it when the regular expression matches: my $capture; while ( my $hehline = DATA ) { $capture = $hehline if $hehline =~ m{^\d+/\d+/\d+\s+\d+:\d+:\d+\s+#\d+$}; if ( $hehline =~ /, HEH/ ) { print $capture; } } John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex help
On 11-10-10 03:56 PM, Chris Stinemetz wrote: Any help is appreciated. Once I match HEH how can alter the program to print the contents that are in the two lines directly above the match? For example in this case I would like the print results to be: **01 REPT:CELL 983 CDM 1, CCU 1, CE 5, HEHTimestamp: 10/10/11 00:01:18 This is not quite what you asked for but it shows how to print lines before a match: #!/usr/bin/env perl use strict; use warnings; # number of lines to save my $save_nbr = 2; # save lines before pattern match # use undef for lines before beginning my @lines = ( undef ) x $save_nbr; while(my $hehline = DATA) { chomp $hehline; if ($hehline =~ /, HEH/) { # print before lines print $_\n for ( grep { defined } @lines ); # print current line print $hehline\n; } # remove first saved line shift @lines; # save current line push @lines, $hehline; } __DATA__ 10/10/11 00:01:17 #984611 A 01 REPT:CELL 833 CP FAILURE, UNANSWERED TERMINATION CDMA TRAFFIC CHANNEL CONFIRMATION FAILURE TRAFFIC CHANNEL FAILURE REASON - ACQUIRE MOBILE FAILURE [2] DCS 1 TG 1723 TM 374 SG 0 ANT 2 CARRIER 4, CHAN UNAVAIL FS-ECP ID 1, SYS ID 4681 DN 3168710330, MIN 3164094259, IMSI UNAVAIL SN ###2ddff3 MEID Xa0###629cc SCM ba ALW CDMA, ASGN CDMA CDM 1, CCU 2, CE 64, MLG 1/MLG_CDM 1 DCS 1/PSU 0/SM 3/BHS 6, ECP ID 1, SYS ID 4681 10/10/11 00:01:18 #984614 **01 REPT:CELL 983 CDM 1, CCU 1, CE 5, HEH SUPPRESSED MSGS: 0 FT PL SECTOR 3 CARRIER 1 (1.9 GHz PCS) FAILURE: OUT OF RANGE PILOT LEVEL: MEASURED = 28.3 dBmEXPECTED = 33.8 dBm SECONDARY UNIT: CDM 1, CBR 3 -- Just my 0.0002 million dollars worth, Shawn Confusion is the first step of understanding. Programming is as much about organization and communication as it is about coding. The secret to great software: Fail early often. Eliminate software piracy: use only FLOSS. Make something worthwhile. -- Dear Hunter -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex help
At 14:56 -0500 10/10/11, Chris Stinemetz wrote: Once I match HEH how can alter the program to print the contents that are in the two lines directly above the match? If it's only one instance you need to deal with then this should do the trick: #!/usr/bin/perl use strict; my @lines; while (DATA){ chomp; s/#.*$//; push @lines, $_; last if /HEH/; } print $lines[-1] Timestamp: $lines[-3]; __END__ JD -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex help
On 16/05/2011 23:44, Owen wrote: I am trying to get all the 6 letter names in the second field in DATA below, eg BARTON DARWIN DARWIN But the script below gives me all 6 letter and more entries. What I read says {6} means exactly 6. What is the correct RE? I have solved the problem my using if (length($data[1]) == 6 ) but would love to know the correct syntax for the RE = #!/usr/bin/perl use strict; use warnings; while (DATA) { my $line = $_; my @line = split /,/; $line[1] =~ s /\//g; print $line[1]\n if $line[1] =~ /\S{6}/; } __DATA__ 0200,AUSTRALIAN NATIONAL UNIVERSITY,ACT,PO Boxes 0221,BARTON,ACT,LVR Special Mailing 0800,DARWIN,NT,,DARWIN DELIVERY CENTRE 0801,DARWIN,NT,GPO Boxes,DARWIN GPO DELIVERY ANNEXE 0804,PARAP,NT,PO Boxes,PARAP LPO 0810,ALAWA,NT,,DARWIN DELIVERY CENTRE 0810,BRINKIN,NT,,DARWIN DELIVERY CENTRE 0810,CASUARINA,NT,,DARWIN DELIVERY CENTRE 0810,COCONUT GROVE,NT,,DARWIN DELIVERY CENTRE === Hi Owen. Your test establishes only whether the pattern can be found within the object string a test like CASUARINA =~ /\S{6}/; finds the six non-space characters CASUAR and then returns success as the criterion has been satisfied. To get it to match /only/ six-character non-space strings you can add anchors at the beginning and end of the regex: CASUARINA =~ /^\S{6}$/; will fail because the sequence beginning of line, six non-space characters, end of line don't appear in CASUARINA. But the proper way to do this is to forget about regular expressions and treat the data as comma-separated fields. The module Text::CSV will do this for you, as per the progrm below. HTH, Rob use strict; use warnings; use Text::CSV; my $csv = Text::CSV-new; while (my $fields = $csv-getline(*DATA)) { my $suburb = $fields-[1]; next unless $suburb and length $suburb == 6; print $suburb, \n; } __DATA__ 0200,AUSTRALIAN NATIONAL UNIVERSITY,ACT,PO Boxes 0221,BARTON,ACT,LVR Special Mailing 0800,DARWIN,NT,,DARWIN DELIVERY CENTRE 0801,DARWIN,NT,GPO Boxes,DARWIN GPO DELIVERY ANNEXE 0804,PARAP,NT,PO Boxes,PARAP LPO 0810,ALAWA,NT,,DARWIN DELIVERY CENTRE 0810,BRINKIN,NT,,DARWIN DELIVERY CENTRE 0810,CASUARINA,NT,,DARWIN DELIVERY CENTRE 0810,COCONUT GROVE,NT,,DARWIN DELIVERY CENTRE **OUTPUT** BARTON DARWIN DARWIN -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex help
On 5/16/11 Mon May 16, 2011 3:44 PM, Owen rc...@pcug.org.au scribbled: I am trying to get all the 6 letter names in the second field in DATA below, eg BARTON DARWIN DARWIN But the script below gives me all 6 letter and more entries. What I read says {6} means exactly 6. \S{6} will match any string containing 6 consecutive non-whitespace characters. It will also match any string containing more than 6 such characters, because any such string contains within it a substring of exactly six characters. Perl matches do not have to match the entire string. What is the correct RE? If you want exactly six characters, then you need to specify that any characters before or after the wanted six are not also members of the desired class. In your case, the easiest way is to anchor the match at the beginning and the end: $line[1] =~ /^\S{6}$/ If you were looking for word characters, e.g. \w, you could use the word boundary assertion metasymbol \b: $line[1] =~ /\b\w{6}\b/ That will not work if your names contain punctuation characters, e.g O'Reilly. More complex matches can use the negative lookahead and lookbehind constructs. I have solved the problem my using if (length($data[1]) == 6 ) but would love to know the correct syntax for the RE TIA Owen = #!/usr/bin/perl use strict; use warnings; while (DATA) { my $line = $_; my @line = split /,/; $line[1] =~ s /\//g; print $line[1]\n if $line[1] =~ /\S{6}/; } __DATA__ 0200,AUSTRALIAN NATIONAL UNIVERSITY,ACT,PO Boxes 0221,BARTON,ACT,LVR Special Mailing 0800,DARWIN,NT,,DARWIN DELIVERY CENTRE 0801,DARWIN,NT,GPO Boxes,DARWIN GPO DELIVERY ANNEXE 0804,PARAP,NT,PO Boxes,PARAP LPO 0810,ALAWA,NT,,DARWIN DELIVERY CENTRE 0810,BRINKIN,NT,,DARWIN DELIVERY CENTRE 0810,CASUARINA,NT,,DARWIN DELIVERY CENTRE 0810,COCONUT GROVE,NT,,DARWIN DELIVERY CENTRE === -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: RegEx help please ...
On Mon, 2008-08-11 at 14:02 -0700, Saya wrote: Hi, I have the following issue: my $s = /metadata-files/test-desc.txt,/metadata-files/birthday.txt,/ web-media/images/bday-after-help.jpg,javascript:popUp('/pop-ups/ birthday/main.html','bday-pics',785,460); Now I want $s to be like: /metadata-files/test-desc.txt,/metadata- files/birthday.txt,/web-media/images/bday-after-help.jpg,/pop-ups/ birthday/main.html I have been working with: $s =~ s/javascript:popup\('(.*)',(.*)/$1/gi; $s =~ s/javascript:popup\('([^']*)',(.*)/$1/gi; -- Just my 0.0002 million dollars worth, Shawn Where there's duct tape, there's hope. Perl is the duct tape of the Internet. Hassan Schroeder, Sun's first webmaster -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: RegEx help please ...
Saya wrote: I have the following issue: my $s = /metadata-files/test-desc.txt,/metadata-files/birthday.txt,/ web-media/images/bday-after-help.jpg,javascript:popUp('/pop-ups/ birthday/main.html','bday-pics',785,460); Now I want $s to be like: /metadata-files/test-desc.txt,/metadata- files/birthday.txt,/web-media/images/bday-after-help.jpg,/pop-ups/ birthday/main.html I have been working with: $s =~ s/javascript:popup\('(.*)',(.*)/$1/gi; But this gives me $s looking like this: /metadata-files/levemir- desc.txt,/metadata-files/levemir-keywords.txt,/web-media/images/ img_insulin_interactive.jpg,/pop-ups/why-insulin/ main.html','quickguide' How can I only I achieve what I am trying to ? Any help or hints will be greatly appreciated. $s =~ s/javascript:popUp\('(.*?)'.*/$1/; does what you want. But without seeing all of the possible data you have I can't be sure that it will work in every case. The main mistake you made was to use a greedy capture /(.*)/ which will match up to the last single-quote in the string, instead of a non-greedy one /(.*?)/ which will match only up to the next single-quote. You also have an unnecessary capture around the trailing /.*/ which is wasteful but will not cause the substitution to fail. HTH, Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: RegEx help please ...
Mr. Shawn H. Corey wrote: On Mon, 2008-08-11 at 14:02 -0700, Saya wrote: I have the following issue: my $s = /metadata-files/test-desc.txt,/metadata-files/birthday.txt,/ web-media/images/bday-after-help.jpg,javascript:popUp('/pop-ups/ birthday/main.html','bday-pics',785,460); Now I want $s to be like: /metadata-files/test-desc.txt,/metadata- files/birthday.txt,/web-media/images/bday-after-help.jpg,/pop-ups/ birthday/main.html I have been working with: $s =~ s/javascript:popup\('(.*)',(.*)/$1/gi; $s =~ s/javascript:popup\('([^']*)',(.*)/$1/gi; There's no reason to capture $2 and not use it. The global substitution is also unlikely to be correct. Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: RegEx help please ...
On Mon, 2008-08-11 at 23:30 +0100, Rob Dixon wrote: Mr. Shawn H. Corey wrote: On Mon, 2008-08-11 at 14:02 -0700, Saya wrote: I have the following issue: my $s = /metadata-files/test-desc.txt,/metadata-files/birthday.txt,/ web-media/images/bday-after-help.jpg,javascript:popUp('/pop-ups/ birthday/main.html','bday-pics',785,460); Now I want $s to be like: /metadata-files/test-desc.txt,/metadata- files/birthday.txt,/web-media/images/bday-after-help.jpg,/pop-ups/ birthday/main.html I have been working with: $s =~ s/javascript:popup\('(.*)',(.*)/$1/gi; $s =~ s/javascript:popup\('([^']*)',(.*)/$1/gi; There's no reason to capture $2 and not use it. The global substitution is also unlikely to be correct. Rob You are making too many assumptions. The OP only posted one line of code. That does not mean that $2 is not used in the next, in which case it should be captured. And the OP only posted one example. The real data may have more than one match. Isn't one of the guidelines for this list is to prune code that has no bearing on the problem? Don't assume that only what is posted is the whole story. -- Just my 0.0002 million dollars worth, Shawn Where there's duct tape, there's hope. Perl is the duct tape of the Internet. Hassan Schroeder, Sun's first webmaster -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: RegEx help please ...
Mr. Shawn H. Corey wrote: On Mon, 2008-08-11 at 23:30 +0100, Rob Dixon wrote: Mr. Shawn H. Corey wrote: On Mon, 2008-08-11 at 14:02 -0700, Saya wrote: I have the following issue: my $s = /metadata-files/test-desc.txt,/metadata-files/birthday.txt,/ web-media/images/bday-after-help.jpg,javascript:popUp('/pop-ups/ birthday/main.html','bday-pics',785,460); Now I want $s to be like: /metadata-files/test-desc.txt,/metadata- files/birthday.txt,/web-media/images/bday-after-help.jpg,/pop-ups/ birthday/main.html I have been working with: $s =~ s/javascript:popup\('(.*)',(.*)/$1/gi; $s =~ s/javascript:popup\('([^']*)',(.*)/$1/gi; There's no reason to capture $2 and not use it. The global substitution is also unlikely to be correct. You are making too many assumptions. The OP only posted one line of code. That does not mean that $2 is not used in the next, in which case it should be captured. And the OP only posted one example. The real data may have more than one match. Isn't one of the guidelines for this list is to prune code that has no bearing on the problem? Don't assume that only what is posted is the whole story. What is posted cannot possibly be the whole story, and I qualified my answer in my response. I consider it extremely unlikely that the string q~'bday-pics',785,460);~ in $2 is wanted later in the program because I cannot conceive of a likely use of the /last/ such capture in conjunction with a global substitution. If you are contending that we cannot assume anything at all about the unseen part of the program then that precludes almost any useful response altogether. I believe we should make our best guess about the likely context of the question and declare any tentative assumptions. If you think the second capture in the regex and the /g flag on the substitution were probably necessary then I disagree completely. If you agree with me that they were probably unnecessary but stuck them in there anyway without comment then I also disagree with you. Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
On Thu, 2008-07-24 at 09:44 -0400, Tony Heal wrote: I have a text dump of a postgresql database and I want to find out if there are any characters that are not standard keyboard characters. Is there a way to use regex to do this without doing a character by character scan of a 5GB file. No. I want to know where any character that is not one of these is in the file: a-z A-Z 0-9 [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]*()[]{};:',./?|\ *()[]{};:',./?|\ See `perldoc POSIX` and search for 'isalpha' -- Just my 0.0002 million dollars worth, Shawn Where there's duct tape, there's hope. Perl is the duct tape of the Internet. Hassan Schroeder, Sun's first webmaster -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help
Ravi Malghan schreef: I want to split this string into an array using comma seperator, but the problem is some values have one or more commas within them. That is a common problem. First split on comma, then recombine elements by using out-of-band knowledge. -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help
Rob Dixon wrote: Ravi Malghan wrote: Hi: I am trying to extract some stuff from a string and not getting the expected results. I have looked through http://www.perl.com/doc/manual/html/pod/perlre.html and can't seem to figure this one out. I have a string which is a sequence of words and each item is comma seperated field1, lengthof value1, value1,field2, length of value2, value2,field3,length of value3, value3 and so on After each field name I have the length of the value I want to split this string into an array using comma seperator, but the problem is some values have one or more commas within them. so for example my string might look like this $origString = EMPLID,4,9066,USERID,7,W3LWEB1,TEXT,54,This is a test note, with some commas, and more commas,ADDR,3515421 Test Lane, Rockville, MD, USA,ESCALATION-LVL,1,0 My current code goes character by character and constructs what I want. But is very slow when this string gets large. The program below will do what you describe. Here's an improvement that explains when it doesn't find values that it expects. Rob use strict; use warnings; my $origString = EMPLID,4,9066,USERID,7,W3LWEB1,TEXT,54,This is a test note, with some commas, and more commas,ADDR,35,15421 Test Lane, Rockville, MD, USA,ESCALATION-LVL,1,0; while() { $origString =~ /\G([^,]+),(\d+),/g or die No field name / size found; my ($field, $size) = ($1, $2); $origString =~ /\G(.{$size})/g or die Insufficient characters for field size; my $value = $1; printf %s (%d) - %s\n, $field, $size, $value; $origString =~ /\G,/g or last; } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help
On Fri, Jun 20, 2008 at 3:10 PM, Ravi Malghan [EMAIL PROTECTED] wrote: Hi: I am trying to extract some stuff from a string and not getting the expected results. I have looked through http://www.perl.com/doc/manual/html/pod/perlre.html and can't seem to figure this one out. I have a string which is a sequence of words and each item is comma seperated field1, lengthof value1, value1,field2, length of value2, value2,field3,length of value3, value3 and so on After each field name I have the length of the value I want to split this string into an array using comma seperator, but the problem is some values have one or more commas within them. so for example my string might look like this $origString = EMPLID,4,9066,USERID,7,W3LWEB1,TEXT,54,This is a test note, with some commas, and more commas,ADDR,3515421 Test Lane, Rockville, MD, USA,ESCALATION-LVL,1,0 My current code goes character by character and constructs what I want. But is very slow when this string gets large. TIA Ravi My solution: use strict; use warnings; my $origString = EMPLID,4,9066,USERID,7,W3LWEB1,TEXT,54,This is a test note, with some commas, and more commas,ADDR,3515421 Test Lane, Rockville, MD, USA,ESCALATION-LVL,1,0; my @arr = split (/,/, $origString); # print join (\n, @arr); exit; while ( scalar @arr ) { my $field = shift @arr; last unless ( defined $field ); my $vlength = shift @arr; last unless ( defined $vlength ); unless ( $vlength =~ /^\d+$/ ) { die Invalid length: [$vlength]\n; } my $value = ; while ( length ( $value ) $vlength ) { my $bit = shift @arr; last unless ( defined $bit ); $value .= , if ( length $value ); $value .= $bit; } print $field - $value\n; } Time it? -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help
Ravi Malghan wrote: Hi: I am trying to extract some stuff from a string and not getting the expected results. I have looked through http://www.perl.com/doc/manual/html/pod/perlre.html and can't seem to figure this one out. I have a string which is a sequence of words and each item is comma seperated field1, lengthof value1, value1,field2, length of value2, value2,field3,length of value3, value3 and so on After each field name I have the length of the value I want to split this string into an array using comma seperator, but the problem is some values have one or more commas within them. so for example my string might look like this $origString = EMPLID,4,9066,USERID,7,W3LWEB1,TEXT,54,This is a test note, with some commas, and more commas,ADDR,3515421 Test Lane, Rockville, MD, USA,ESCALATION-LVL,1,0 My current code goes character by character and constructs what I want. But is very slow when this string gets large. $ perl -le' my $origString = EMPLID,4,9066,USERID,7,W3LWEB1,TEXT,54,This is a test note, with some commas, and more commas,ADDR,3515421 Test Lane, Rockville, MD, USA,ESCALATION-LVL,1,0; while ( $origString =~ /([^,]+),(\d+),/g ) { print for $1, $2, substr $origString, pos( $origString ), $2; } ' EMPLID 4 9066 USERID 7 W3LWEB1 TEXT 54 This is a test note, with some commas, and more commas ESCALATION-LVL 1 0 John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order.-- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help
Ravi Malghan wrote: Hi: I am trying to extract some stuff from a string and not getting the expected results. I have looked through http://www.perl.com/doc/manual/html/pod/perlre.html and can't seem to figure this one out. I have a string which is a sequence of words and each item is comma seperated field1, lengthof value1, value1,field2, length of value2, value2,field3,length of value3, value3 and so on After each field name I have the length of the value I want to split this string into an array using comma seperator, but the problem is some values have one or more commas within them. so for example my string might look like this $origString = EMPLID,4,9066,USERID,7,W3LWEB1,TEXT,54,This is a test note, with some commas, and more commas,ADDR,3515421 Test Lane, Rockville, MD, USA,ESCALATION-LVL,1,0 My current code goes character by character and constructs what I want. But is very slow when this string gets large. The program below will do what you describe. HTH, Rob use strict; use warnings; my $origString = EMPLID,4,9066,USERID,7,W3LWEB1,TEXT,54,This is a test note, with some commas, and more commas,ADDR,3515421 Test Lane, Rockville, MD, USA,ESCALATION-LVL,1,0; while() { $origString =~ /\G([^,]+),/g or last; my $field = $1; $origString =~ /\G(\d+),/g or last; my $size = $1; $origString =~ /\G(.{$size}),?/g or last; my $value = $1; printf %s(%d) - %s\n, $field, $size, $value; } **OUTPUT** EMPLID(4) - 9066 USERID(7) - W3LWEB1 TEXT(54) - This is a test note, with some commas, and more commas -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help
Ravi Malghan wrote: I have a string which is a sequence of words and each item is comma seperated field1, lengthof value1, value1,field2, length of value2, value2,field3,length of value3, value3 and so on After each field name I have the length of the value I want to split this string into an array using comma seperator, but the problem is some values have one or more commas within them. Okay. There is a missing comma between ADDR,35 and 15421, right? Under that assumption, I believe this code gets what you want: C:\hometype test.pl my $origString = EMPLID,4,9066,USERID,7,W3LWEB1,TEXT,54, . This is a test note, with some commas, and more commas, . ADDR,35,15421 Test Lane, Rockville, MD, USA,ESCALATION-LVL,1,0; my @parts = split /([A-Z-]+),(\d+)/, $origString; shift @parts; while ( my $k = shift @parts ) { my $length = shift @parts; print $k = , substr( shift @parts, 1, $length ), \n; } C:\hometest.pl EMPLID = 9066 USERID = W3LWEB1 TEXT = This is a test note, with some commas, and more commas ADDR = 15421 Test Lane, Rockville, MD, USA ESCALATION-LVL = 0 C:\home -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help
On Jun 20, 9:10 am, [EMAIL PROTECTED] (Ravi Malghan) wrote: Hi: I am trying to extract some stuff from a string and not getting the expected results. I have looked throughhttp://www.perl.com/doc/manual/html/pod/perlre.html and can't seem to figure this one out. I have a string which is a sequence of words and each item is comma seperated field1, lengthof value1, value1,field2, length of value2, value2,field3,length of value3, value3 and so on After each field name I have the length of the value I want to split this string into an array using comma seperator, but the problem is some values have one or more commas within them. so for example my string might look like this $origString = EMPLID,4,9066,USERID,7,W3LWEB1,TEXT,54,This is a test note, with some commas, and more commas,ADDR,3515421 Test Lane, Rockville, MD, USA,ESCALATION-LVL,1,0 My current code goes character by character and constructs what I want. But is very slow when this string gets large. TIA Ravi I want to split this string into an array using comma seperator, but the problem is some values have one or more commas within them [..] snip My current code goes character by character and constructs what I want. But is very slow when this string gets large. Post your code or relevant portion. Otherwise we might repeating here stuff what you've done or tried already. Is there any way you can use another delimiter such as tildes ~ or something? If you tweak to accept other delimiters that would be easier to treat. If you cannot, you could use regex to find the next alpha_num character of the string and put those into an array, \w Match a word character (alphanumeric plus _) \W Match a non-word character \b Match a word boundary \B Match a non-(word boundary) or find out exactly the number of commas it may have and weed them out... * Match 0 or more times + Match 1 or more times ? Match 1 or 0 times {n}Match exactly n times {n,} Match at least n times {n,m} Match at least n but not more than m times etc.. But again, post your code so we don't overlap... -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help?
sanket vaidya wrote: HI all, Hello, Kindly go through the code below. use warnings; use strict; my $i=1; while($i=10) { $_ = abcpqr; $_=~ s/(?=pqr)/$i/; print $_\n; $i++; } Output: abc1pqr abc2pqr abc3pqr abc4pqr abc5pqr abc6pqr abc7pqr abc8pqr abc9pqr abc10pqr The expected output is abc001pqr abc002pqr abc003pqr abc004pqr abc005pqr abc006pqr abc007pqr abc008pqr abc009pqr abc010pqr Can any one suggest me how to get that output using regex. i.e. Can this happen by making change in regex I used in code?? $ perl -e' use warnings; use strict; for my $i ( 1 .. 10 ) { $_ = abcpqr; $_ =~ s/(?=pqr)/sprintf q[%03d], $i/e; print $_\n; } ' abc001pqr abc002pqr abc003pqr abc004pqr abc005pqr abc006pqr abc007pqr abc008pqr abc009pqr abc010pqr Or maybe this would be better: $ perl -e' use warnings; use strict; $_ = abcpqr; $_ =~ s/(?=pqr)/%03d/; for my $i ( 1 .. 10 ) { printf $_\n, $i; } ' abc001pqr abc002pqr abc003pqr abc004pqr abc005pqr abc006pqr abc007pqr abc008pqr abc009pqr abc010pqr John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order.-- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help?
sanket vaidya schreef: use warnings; use strict; my $i=1; while($i=10) { $_ = abcpqr; $_=~ s/(?=pqr)/$i/; print $_\n; $i++; } [...] The expected output is abc001pqr [...] Can any one suggest me how to get that output using regex. i.e. Can this happen by making change in regex I used in code?? Why use a regular expression, or even a substitution? perl -Mstrict -Mwarnings -e' printf q{abc%03dpqr%s}, $_, $/ for 1..3; ' abc001pqr abc002pqr abc003pqr -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help
See the codes below. The trick here is that $ is an special var used to capture the total match string. So in this case, the /[$]/ regexp literal is equal to / [.type=xmlrpc]/. #!/bin/perl $url = 'http://abc.com/test.cgi?TEST=1.type=xmlrpctype=2'; ($r1) = $url =~ /\.type=(.+?)(|$)/; print \$=$\n; ($r2) = $url =~ /\.type=(.+?)[$]/; print \$r1=$r1\n\$r2=$r2\n __DATA__ $=.type=xmlrpc $r1=xmlrpc $r2=x -Todd -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help
On Saturday 24 November 2007 22:59, Todd wrote: See the codes below. The trick here is that $ is an special var used to capture the total match string. So in this case, the /[$]/ regexp literal is equal to / [.type=xmlrpc]/. Right. And that is a character class which says to match *one* character, either '.' or '=' or '' or 'c' or 'e' or 'l' or 'm' or 'p' or 'r' or 't' or 'x' or 'y'. #!/bin/perl $url = 'http://abc.com/test.cgi?TEST=1.type=xmlrpctype=2'; ($r1) = $url =~ /\.type=(.+?)(|$)/; print \$=$\n; ($r2) = $url =~ /\.type=(.+?)[$]/; print \$r1=$r1\n\$r2=$r2\n __DATA__ $=.type=xmlrpc $r1=xmlrpc $r2=x Which is why $r2=x because (.+?) matches the 'x' after '.type=' and [.type=xmlrpc] matches the 'm' after '.type=x'. Also, using $ (or $` or $') slows down all regular expressions in the program. John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help
howa wrote: Hello, I want to match a query string, e.g. http://abc.com/test.cgi?TEST=1.type=xmlrpctype=2 I want to extract the .type, currently i use $ENV{QUERY_STRING} =~ /\.type=(.+?)(|$)/; # This is okay but back reference seem no good, i rewrite as $ENV{QUERY_STRING} =~ /\.type=(.+?)[$]/; # seem better, but not working any idea? Unless you are committed to using a regex, a nicer idea may be to use the excellent URI modules, as in the program below. HTH, Rob use strict; use warnings; use URI; use URI::QueryParam; my $uri = URI-new('http://abc.com/test.cgi?TEST=1.type=xmlrpctype=2'); print $uri-query_param('.type'); **OUTPUT** xmlrpc -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help
On Thursday 22 November 2007 23:23, howa wrote: Hello, Hello, I want to match a query string, e.g. http://abc.com/test.cgi?TEST=1.type=xmlrpctype=2 I want to extract the .type, currently i use $ENV{QUERY_STRING} =~ /\.type=(.+?)(|$)/; # This is okay but back reference seem no good, Why does it seem no good? Perhaps you should use non-capturing parentheses instead. Or maybe this would work better: $ENV{ QUERY_STRING } =~ /\.type=([^]+)/; i rewrite as $ENV{QUERY_STRING} =~ /\.type=(.+?)[$]/; # seem better, but not working [$] is a character class that matches either the '$' character or the '' character. In the previous example '$' in a pattern but not in a character class is a meta-character that matches at end-of-line, not the literal '$' character. John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex Help
Omega -1911 wrote: which isn't an equivalent to yours - it simply makes sure that the record contains 'Powerball:' and at least one digit - but I'm sure it is adequate. My own solution didn't even do this much checking, since I read the OP as saying that all irrelevant data records had been removed. I appreciate the help as I am understanding the examples, but when I ran Dr. Rudd's example, I had weird data in the common array (Notice the number 440): I'd guess the first example didn't include the conditional checking the data format and there were some 440 HTTP errors while retrieving the data (obviously hinging on whether that applies to the data source, especially since it was initially specified as a web page rather than a web site). common : 120, 10, 07, 440, 6, 120, 7, 07, 440, 22, 120, 3, 07, 440, 1, 120, 31, 07, 440, 6, 120, 27, 07, 440, 13, 120, 24, 07, 440, 10, 120, 20, 07, 440, 10, 120, 17, 07, 440, 14, 120, 13, 07, 440, 21, 120, 10, 07, 440, 12, 120, 6, 07, 440, 8, 120, 3, 07, 440, 2, 120, 29, 07, 440, 31, 120, 26, 07, 440, 25, 120, 22, 07, 440, 4, 120, 19, 07, 440, 20, 120, 15, 07, 440, 13, 120, 12, 07, 440, 5, 120, 8, 07, 440, 7, 120, 5, 07, 440, 11, 120, 1, 07, 440, 12, 120, 29, 07, 440, 13, 120, 25, 07, 440, 2, 120, 22, 07, 440, 12, 120, 18, 07, 440, 12, 120, 15, 07, 440, 19, 120, 11, 07, 440, 1, 120, 8, 07, 440, 9, 120, 4, 07, 440, 2, 120, 1, 07, 440, 9, 120, 28, 07, 440, 15, 120, 25, 07, 440, 28, 120, 21, 07, 440, 14, 120, 18, 07, 440, 3, 120, 14, 07, 440, 1, 120, 11, 07, 440, 8, 120, 7, 07, 440, 15, 120, 4, 07, 440, 1, 120, 30, 07, 440, 24, 120, 27, 07, 440, 9, 120, 23, 07, 440, 14, 120, 20, 07, 440, 23, 120, 16, 07, 440, 4, 120, 13, 07, 440, 10, 120, 9, 07, 440, 7, 120, 6, 07, 440, 5, 120, 2, 07, 440, 2, 120, 30, 07, 440, 7, 120, 26, 07, 440, 1, 120, 23, 07, 440, 3, 120, 19, 07, 440, 3, 120, 16, 07, 440, 6, 120, 12, 07, 440, 30, 120, 9, 07, 440, 2, 120, 5, 07, 440, 13, 120, 2, 07, 440, 1, 120, 28, 07, 440, 16, 120, 25, 07, 440, 12, 120, 21, 07, 440, 22, 120, 18, 07, 440, 6, 120, 14, 07, 440, 12, 120, 11, 07, 440, 6, 120, 7, 07, 440, 2, 120, 4, 07, 440, 19, 120, 31, 07, 440, 2, 120, 28, 07, 440, 6, 120, 24, 07, 440, 10, 120, 21, 07, 440, 16, 120, 17, 07, 440, 7, 120, 14, 07, 440, 4, 120, 10, 07, 440, 14, 120, 7, 07, 440, 13, 120, 3, 07, 440, 1, 120, 28, 07, 440, 13, 120, 24, 07, 440, 36, 120, 21, 07, 440, 2, 120, 17, 07, 440, 1, 120, 14, 07, 440, 3, 120, 10, 07, 440, 2, 120, 7, 07, 440, 4, 120, 3, 07, 440, 12, 120, 31, 07, 440, 2, 120, 27, 07, 440, 10, 120, 24, 07, 440, 9, 120, 20, 07, 440, 1, 120, 17, 07, 440, 16, 120, 13, 07, 440, 1, 120, 10, 07, 440, 36, 120, 6, 07, 440, 1, 120, 3, 07, 440, 10, 120, 30, 06, 440, 9, 120, 27, 06, 440, 14, 120, 23, 06, 440, 8, 120, 20, 06, 440, 1, 120, 16, 06, 440, 5, 120, 13, 06, 440, 19, 120, 9, 06, 440, 19, 120, 6, 06, 440, 7, 120, 2, 06, 440, 17, 120, 29, 06, 440, 2, 120, 25, 06, 440, 5, 120, 22, 06, 440, 22, 120, 18, 06, 440, 1, 120, 15, 06, 440, 11, 120, 11, 06, 440, 35 powerball : 22, 29, 31, 16, 25, 11, 11, 15, 30, 16, 30, 4, 33, 27, 9, 25, 16, 24, 12, 20, 19, 16, 8, 37, 15, 22, 10, 16, 23, 16, 19, 35, 30, 9, 21, 20, 21, 2, 38, 11, 15, 31, 8, 13, 10, 9, 22, 23, 5, 11, 19, 7, 44, 13, 21, 8, 22, 13, 26, 10, 21, 15, 28, 30, 5, 20, 38, 24, 17, 27, 18, 16, 5, 20, 38, 8, 15, 26, 11, 22, 13, 15, 19, 19, 5, 35, 21, 42, 24, 12, 23, 16, 27, 6, 17, 32, 22, 34, 34, 8, 18, 32, 8, 28, 38 BUT, when I run his other example (see below), everything worked as well as the other examples you all supplied: while (DATA) { if (my @numbers = /(\d+), (\d+), (\d+), (\d+), (\d+), Powerball: (\d+)/) { push @common, @numbers[0..4]; push @powerball, $numbers[5]; } else { ... } } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex Help
From: Dr.Ruud [EMAIL PROTECTED] Jenda Krynicky schreef: { my $static; sub foo { $static++; ... } } There (the first declared version of) the variable $static is part of the environment of foo(). Don't mistake that for staticness. Maybe I don't know what does staticness mean then. I though a static variable is one that is private to a function, but keeps the value between the function's invocations. How do you define staticness? In Perl 5.8.8 you can enforce $static to be static, like this: { 0 and my $static; sub foo { $static++; ... } } That ugly my() can only occur once, ut it still makes the variable lexical. There is just no better way to set up a real static variable in Perl 5.8.8. Check out the differences between the following two academic examples: $ perl -le' for (7..9) { my $static = $_; # declared and initialised 3 times sub foo { $static++; # uses the first of the declared $static's print foo:$static; } foo() for 0..1; print for:$static; } ' With -w you get a Variable $static will not stay shared warning. And rightly so. You are doing something you are not supposed to do. A named subroutine inside another subroutine or a loop is a red flag. Something that (unless found in an obfuscation) suggests that the author of the code misunderstood something. It's yet another please don't do this. $ perl -le' for (7..9) { my $static = $_; # declared and initialised 3 times my $foo = sub { $static++; # uses the first of the declared $static's print foo:$static; }; $foo-() for 0..1; print for:$static; } ' Jenda = [EMAIL PROTECTED] === http://Jenda.Krynicky.cz = When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex Help
Jenda Krynicky schreef: if 0 and my $x; creates a static $x I call it a bug. It's called a feature. -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex Help
Jenda Krynicky schreef: I'd definitely never ever do condition and my $x = blah(); That is what I said. It is technically OK to use it with a condition that can not be decided at compile time, but I still recommend not to use it. -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex Help
Jenda Krynicky schreef: { my $static; sub foo { $static++; ... } } There (the first declared version of) the variable $static is part of the environment of foo(). Don't mistake that for staticness. In Perl 5.8.8 you can enforce $static to be static, like this: { 0 and my $static; sub foo { $static++; ... } } That ugly my() can only occur once, ut it still makes the variable lexical. There is just no better way to set up a real static variable in Perl 5.8.8. Check out the differences between the following two academic examples: $ perl -le' for (7..9) { my $static = $_; # declared and initialised 3 times sub foo { $static++; # uses the first of the declared $static's print foo:$static; } foo() for 0..1; print for:$static; } ' foo:8 foo:9 for:9 foo:10 foo:11 for:8 (would be undef without the initialisation) foo:12 foo:13 for:9 (would be undef without the initialisation) $ perl -le' for (7..9) { 0 and my $static = $_; # declared *once*, # *never* initialised sub foo { $static++; print foo:$static; } foo() for 0..1; print for:$static; } ' foo:1 foo:2 for:2 foo:3 foo:4 for:4 foo:5 foo:6 for:6 -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex Help
From: Dr.Ruud [EMAIL PROTECTED] Rob Dixon schreef: Dr.Ruud: John W . Krahn: /Powerball:/ and my @numbers = /\d+/g; I wouldn't use such a conditional my. There is no conditional 'my': it is a de[c]laration. I call it a conditional my. A my can be just a declaration, or a declaration and an initialisation. In this case only the initialisation is conditional. A my in a condition has special behaviour if the condition is constant false: 0 and my $var; creates a static $var. As I wrote: *I* wouldn't use *such* a conditional my. I put the declaration on its own line, just before the conditional initialisation. I sometimes use a conditional my if I want the static behaviour, but not in production code. Perl 5.10 has static. Perl 5.x has { my $static; sub foo { $static++; ... } } which even lets you create variables that are shared by several subroutines. I do understand you might want to use my() like this: open my $FH, '', $filename or die $^E; or if (my $foo = foo($x, $y, $z) and my $bar = bar(1,2,3)) { and use $foo and $bar here } but I'd definitely never ever do condition and my $x = blah(); and if 0 and my $x; creates a static $x I call it a bug. Jend = [EMAIL PROTECTED] === http://Jenda.Krynicky.cz = When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex Help
On Saturday 10 November 2007 06:39, Dr.Ruud wrote: Jonathan Lang schreef: while (DATA) { ($a[0], $a[1], $a[2], $a[3], $a[4], $b) = /(\d+), (\d+), (\d+), (\d+), (\d+), Powerball: (\d+)/; push @common, @a; push @powerball, $b; } A slightly different way to do that, is: while (DATA) { if (my @numbers = /(\d+), (\d+), (\d+), (\d+), (\d+), Powerball: (\d+)/) { Another way to do that: /Powerball:/ and my @numbers = /\d+/g; push @common, @numbers[0..4]; push @powerball, $numbers[5]; } else { ... } } John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex Help
John W . Krahn schreef: Dr.Ruud: Jonathan Lang: while (DATA) { ($a[0], $a[1], $a[2], $a[3], $a[4], $b) = /(\d+), (\d+), (\d+), (\d+), (\d+), Powerball: (\d+)/; push @common, @a; push @powerball, $b; } A slightly different way to do that, is: while (DATA) { if (my @numbers = /(\d+), (\d+), (\d+), (\d+), (\d+), Powerball: (\d+)/) { Another way to do that: /Powerball:/ and my @numbers = /\d+/g; push @common, @numbers[0..4]; push @powerball, $numbers[5]; } else { ... } } I wouldn't use such a conditional my. So maybe you meant it more like: if ( /Powerball:/ ) { if ( (my @numbers = /\d+/g) = 5 ) { push @common, @numbers[0..4]; push @powerball, $numbers[5]; } else { } } else { ... } For example: #!/usr/bin/perl use strict; use warnings; { local ($, $\) = (, , \n); my @common; my @powerball; while (DATA) { if ( /Powerball:/ ) { if ( (my @numbers = /\b\d+\b/g) 5 ) { push @common, @numbers[0..4]; push @powerball, $numbers[5]; } else { print EOS; * * ERROR * parsing input line $. * EOS } } else { # do nothing } } print common: @common; print powerball : @powerball; } __DATA__ abc 01 def 02 ghi 03 ijk 04 lmn 05 Powerbalx: 06 xyz abc 11 def 12 ghi 13 ijk 14 lmn 15 Powerball: 16 xyz abc 21 def 22 ghi 23 ijk 24 lmn 25 Powerball: X6 xyz abc 31 def 32 ghi 33 ijk 34 lmn 35 Powerball: 36 xyz test abc 41 def 42 ghi 43 ijk 44 lmn 45 Powerball: 46.3 xyz -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex Help
Dr.Ruud wrote: John W . Krahn schreef: Dr.Ruud: Jonathan Lang: while (DATA) { ($a[0], $a[1], $a[2], $a[3], $a[4], $b) = /(\d+), (\d+), (\d+), (\d+), (\d+), Powerball: (\d+)/; push @common, @a; push @powerball, $b; } A slightly different way to do that, is: while (DATA) { if (my @numbers = /(\d+), (\d+), (\d+), (\d+), (\d+), Powerball: (\d+)/) { Another way to do that: /Powerball:/ and my @numbers = /\d+/g; push @common, @numbers[0..4]; push @powerball, $numbers[5]; } else { ... } } I wouldn't use such a conditional my. So maybe you meant it more like: if ( /Powerball:/ ) { if ( (my @numbers = /\d+/g) = 5 ) { push @common, @numbers[0..4]; push @powerball, $numbers[5]; } else { } } else { ... } There is no conditional 'my': it is a delaration. I believe John was suggesting a replacement just for your conditional expression: if (/Powerball:/ and my @numbers = /\d+/g) { push @common, @numbers[0..4]; push @powerball, $numbers[5]; } else { : } which isn't an equivalent to yours - it simply makes sure that the record contains 'Powerball:' and at least one digit - but I'm sure it is adequate. My own solution didn't even do this much checking, since I read the OP as saying that all irrelevant data records had been removed. Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex Help
which isn't an equivalent to yours - it simply makes sure that the record contains 'Powerball:' and at least one digit - but I'm sure it is adequate. My own solution didn't even do this much checking, since I read the OP as saying that all irrelevant data records had been removed. I appreciate the help as I am understanding the examples, but when I ran Dr. Rudd's example, I had weird data in the common array (Notice the number 440): common : 120, 10, 07, 440, 6, 120, 7, 07, 440, 22, 120, 3, 07, 440, 1, 120, 31, 07, 440, 6, 120, 27, 07, 440, 13, 120, 24, 07, 440, 10, 120, 20, 07, 440, 10, 120, 17, 07, 440, 14, 120, 13, 07, 440, 21, 120, 10, 07, 440, 12, 120, 6, 07, 440, 8, 120, 3, 07, 440, 2, 120, 29, 07, 440, 31, 120, 26, 07, 440, 25, 120, 22, 07, 440, 4, 120, 19, 07, 440, 20, 120, 15, 07, 440, 13, 120, 12, 07, 440, 5, 120, 8, 07, 440, 7, 120, 5, 07, 440, 11, 120, 1, 07, 440, 12, 120, 29, 07, 440, 13, 120, 25, 07, 440, 2, 120, 22, 07, 440, 12, 120, 18, 07, 440, 12, 120, 15, 07, 440, 19, 120, 11, 07, 440, 1, 120, 8, 07, 440, 9, 120, 4, 07, 440, 2, 120, 1, 07, 440, 9, 120, 28, 07, 440, 15, 120, 25, 07, 440, 28, 120, 21, 07, 440, 14, 120, 18, 07, 440, 3, 120, 14, 07, 440, 1, 120, 11, 07, 440, 8, 120, 7, 07, 440, 15, 120, 4, 07, 440, 1, 120, 30, 07, 440, 24, 120, 27, 07, 440, 9, 120, 23, 07, 440, 14, 120, 20, 07, 440, 23, 120, 16, 07, 440, 4, 120, 13, 07, 440, 10, 120, 9, 07, 440, 7, 120, 6, 07, 440, 5, 120, 2, 07, 440, 2, 120, 30, 07, 440, 7, 120, 26, 07, 440, 1, 120, 23, 07, 440, 3, 120, 19, 07, 440, 3, 120, 16, 07, 440, 6, 120, 12, 07, 440, 30, 120, 9, 07, 440, 2, 120, 5, 07, 440, 13, 120, 2, 07, 440, 1, 120, 28, 07, 440, 16, 120, 25, 07, 440, 12, 120, 21, 07, 440, 22, 120, 18, 07, 440, 6, 120, 14, 07, 440, 12, 120, 11, 07, 440, 6, 120, 7, 07, 440, 2, 120, 4, 07, 440, 19, 120, 31, 07, 440, 2, 120, 28, 07, 440, 6, 120, 24, 07, 440, 10, 120, 21, 07, 440, 16, 120, 17, 07, 440, 7, 120, 14, 07, 440, 4, 120, 10, 07, 440, 14, 120, 7, 07, 440, 13, 120, 3, 07, 440, 1, 120, 28, 07, 440, 13, 120, 24, 07, 440, 36, 120, 21, 07, 440, 2, 120, 17, 07, 440, 1, 120, 14, 07, 440, 3, 120, 10, 07, 440, 2, 120, 7, 07, 440, 4, 120, 3, 07, 440, 12, 120, 31, 07, 440, 2, 120, 27, 07, 440, 10, 120, 24, 07, 440, 9, 120, 20, 07, 440, 1, 120, 17, 07, 440, 16, 120, 13, 07, 440, 1, 120, 10, 07, 440, 36, 120, 6, 07, 440, 1, 120, 3, 07, 440, 10, 120, 30, 06, 440, 9, 120, 27, 06, 440, 14, 120, 23, 06, 440, 8, 120, 20, 06, 440, 1, 120, 16, 06, 440, 5, 120, 13, 06, 440, 19, 120, 9, 06, 440, 19, 120, 6, 06, 440, 7, 120, 2, 06, 440, 17, 120, 29, 06, 440, 2, 120, 25, 06, 440, 5, 120, 22, 06, 440, 22, 120, 18, 06, 440, 1, 120, 15, 06, 440, 11, 120, 11, 06, 440, 35 powerball : 22, 29, 31, 16, 25, 11, 11, 15, 30, 16, 30, 4, 33, 27, 9, 25, 16, 24, 12, 20, 19, 16, 8, 37, 15, 22, 10, 16, 23, 16, 19, 35, 30, 9, 21, 20, 21, 2, 38, 11, 15, 31, 8, 13, 10, 9, 22, 23, 5, 11, 19, 7, 44, 13, 21, 8, 22, 13, 26, 10, 21, 15, 28, 30, 5, 20, 38, 24, 17, 27, 18, 16, 5, 20, 38, 8, 15, 26, 11, 22, 13, 15, 19, 19, 5, 35, 21, 42, 24, 12, 23, 16, 27, 6, 17, 32, 22, 34, 34, 8, 18, 32, 8, 28, 38 BUT, when I run his other example (see below), everything worked as well as the other examples you all supplied: while (DATA) { if (my @numbers = /(\d+), (\d+), (\d+), (\d+), (\d+), Powerball: (\d+)/) { push @common, @numbers[0..4]; push @powerball, $numbers[5]; } else { ... } } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex Help
Rob Dixon schreef: Dr.Ruud: John W . Krahn: /Powerball:/ and my @numbers = /\d+/g; I wouldn't use such a conditional my. There is no conditional 'my': it is a de[c]laration. I call it a conditional my. A my can be just a declaration, or a declaration and an initialisation. In this case only the initialisation is conditional. A my in a condition has special behaviour if the condition is constant false: 0 and my $var; creates a static $var. As I wrote: *I* wouldn't use *such* a conditional my. I put the declaration on its own line, just before the conditional initialisation. I sometimes use a conditional my if I want the static behaviour, but not in production code. Perl 5.10 has static. -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex Help
On Nov 10, 2007 5:10 PM, Omega -1911 [EMAIL PROTECTED] wrote: What I will need to be able to do is place the most common 5 numbers (before the word powerball) into an array then place the powerball numbers into another array. Thanks in advance. @liners = split /(\s\[0-9],\s)Powerball:\s[0-9]/,$data_string; _DATA_ 22, 29, 35, 46, 52, Powerball: 2, Power Play: 5 1, 31, 38, 40, 53, Powerball: 42, Power Play: 2 6, 16, 18, 29, 37, Powerball: 24, Power Play: 2 Hi, I just think the data stru you need is a hash not two arrays.The entire code can be: use strict; use warnings; use Data::Dumper; my %hash; while(DATA) { my ($li,$powerb) = /^(.+)\,\s*Powerball\:\s*(\d+)/; $hash{$powerb} = [split/,/,$li]; } print Dumper \%hash; __DATA__ 22, 29, 35, 46, 52, Powerball: 2, Power Play: 5 1, 31, 38, 40, 53, Powerball: 42, Power Play: 2 6, 16, 18, 29, 37, Powerball: 24, Power Play: 2 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex Help
Omega -1911 wrote: @liners = split /(\s\[0-9],\s)Powerball:\s[0-9]/,$data_string; Instead of split, just do a pattern match: ($a[0], $a[1], $a[2], $a[3], $a[4], $b) = /(\d+), (\d+), (\d+), (\d+), (\d+), Powerball: (\d+)/; This puts the first five numbers into the array @a, and puts the powerball number into scalar $b. Note that this tackles a single line of data. To get everything, cycle through the lines using a while (DATA) and push the results onto the two arrays as you get them: push @common, @a; push @powerball, $b; In whole, you get: while (DATA) { ($a[0], $a[1], $a[2], $a[3], $a[4], $b) = /(\d+), (\d+), (\d+), (\d+), (\d+), Powerball: (\d+)/; push @common, @a; push @powerball, $b; } When you're done, @common is (22, 29, 35, 46, 52, 1, 31, 38, 40, 53, 6, 16, 18, 29, 37), and @powerball is (2, 42, 24). -- Jonathan Dataweaver Lang -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex Help
Jonathan Lang schreef: while (DATA) { ($a[0], $a[1], $a[2], $a[3], $a[4], $b) = /(\d+), (\d+), (\d+), (\d+), (\d+), Powerball: (\d+)/; push @common, @a; push @powerball, $b; } A slightly different way to do that, is: while (DATA) { if (my @numbers = /(\d+), (\d+), (\d+), (\d+), (\d+), Powerball: (\d+)/) { push @common, @numbers[0..4]; push @powerball, $numbers[5]; } else { ... } } -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex Help
Thank you both Dr.Ruud Jonathan Lang. I will give both examples a try later today and let you know how it all turns out. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex Help
Thank you both Dr.Ruud Jonathan Lang. I will give both examples a try later today and let you know how it all turns out. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
On Sep 25, 4:33 pm, [EMAIL PROTECTED] (Rob Dixon) wrote: Jonathan Lang wrote: Rob Dixon wrote: Jonathan Lang wrote: I'm trying to devise a regex that matches from the first double-quote character found to the next double-quote character that isn't part of a pair; but for some reason, I'm having no luck. Here's what I tried: /(.*?)(?!)/ Sample text: author: Jonathan Dataweaver Lang key=val What I'm getting for $1 in the first match: Jonathan What I'm looking for: Jonathan Dataweaver Lang What did I miss, and how can I most efficiently perform the desired match? Your regex looks for the first double-quote and then captures everything after that up to the first subsequent double-quote that isn't followed immediately by another one. The second quote of the pair before 'Dataweaver' matches this criterion so your regex captures up to the character before it. This: $str =~ /((?:.*?)*.*?)/; should do what you want. After finding the first double-quote it captures all following sequences ending in a pair of double quotes, plus anything after those up to the closing quote. Ah. I had tried /((.*?)*.*?)/ and hadn't gotten it to work; it never occurred to me to try the non-capturing group instead. That also works! (But is performing unnecessary and wasteful captures.) Rob use strict; use warnings; my $str = q(author: Jonathan Dataweaver Lang key=val); $str =~ /((.*?)*.*?)/; print $1, \n; **OUTPUT** Jonathan Dataweaver Lang use strict; use warnings; my $str = q(author: Jonathan Dataweaver Lang key=val fly-in- ointment: Brian Nobull McCauley); $str =~ /((.*?)*.*?)/; print $1, \n; __END__ **OUTPUT** Jonathan Dataweaver Lang key=val fly-in-ointment: Brian Nobull McCaule y An alternative pattern would be /((?:[^]*)*.*?)/ although the behaviour or that may be counter-intuative if presented with bad input in which there's no closing quote. My perferred pattern would be much closer to Jonathan's original: /((?:[^]|)*)(?!)/ This has the advantage of failing to match if presented with input that lacks a closing quote. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
Jonathan Lang wrote: I'm trying to devise a regex that matches from the first double-quote character found to the next double-quote character that isn't part of a pair; but for some reason, I'm having no luck. Here's what I tried: /(.*?)(?!)/ Sample text: author: Jonathan Dataweaver Lang key=val What I'm getting for $1 in the first match: Jonathan What I'm looking for: Jonathan Dataweaver Lang What did I miss, and how can I most efficiently perform the desired match? Your regex looks for the first double-quote and then captures everything after that up to the first subsequent double-quote that isn't followed immediately by another one. The second quote of the pair before 'Dataweaver' matches this criterion so your regex captures up to the character before it. This: $str =~ /((?:.*?)*.*?)/; should do what you want. After finding the first double-quote it captures all following sequences ending in a pair of double quotes, plus anything after those up to the closing quote. HTH, Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
Rob Dixon wrote: Jonathan Lang wrote: I'm trying to devise a regex that matches from the first double-quote character found to the next double-quote character that isn't part of a pair; but for some reason, I'm having no luck. Here's what I tried: /(.*?)(?!)/ Sample text: author: Jonathan Dataweaver Lang key=val What I'm getting for $1 in the first match: Jonathan What I'm looking for: Jonathan Dataweaver Lang What did I miss, and how can I most efficiently perform the desired match? Your regex looks for the first double-quote and then captures everything after that up to the first subsequent double-quote that isn't followed immediately by another one. The second quote of the pair before 'Dataweaver' matches this criterion so your regex captures up to the character before it. This: $str =~ /((?:.*?)*.*?)/; should do what you want. After finding the first double-quote it captures all following sequences ending in a pair of double quotes, plus anything after those up to the closing quote. Ah. I had tried /((.*?)*.*?)/ and hadn't gotten it to work; it never occurred to me to try the non-capturing group instead. Thank you. -- Jonathan Dataweaver Lang -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
Jonathan Lang wrote: Rob Dixon wrote: Jonathan Lang wrote: I'm trying to devise a regex that matches from the first double-quote character found to the next double-quote character that isn't part of a pair; but for some reason, I'm having no luck. Here's what I tried: /(.*?)(?!)/ Sample text: author: Jonathan Dataweaver Lang key=val What I'm getting for $1 in the first match: Jonathan What I'm looking for: Jonathan Dataweaver Lang What did I miss, and how can I most efficiently perform the desired match? Your regex looks for the first double-quote and then captures everything after that up to the first subsequent double-quote that isn't followed immediately by another one. The second quote of the pair before 'Dataweaver' matches this criterion so your regex captures up to the character before it. This: $str =~ /((?:.*?)*.*?)/; should do what you want. After finding the first double-quote it captures all following sequences ending in a pair of double quotes, plus anything after those up to the closing quote. Ah. I had tried /((.*?)*.*?)/ and hadn't gotten it to work; it never occurred to me to try the non-capturing group instead. That also works! (But is performing unnecessary and wasteful captures.) Rob use strict; use warnings; my $str = q(author: Jonathan Dataweaver Lang key=val); $str =~ /((.*?)*.*?)/; print $1, \n; **OUTPUT** Jonathan Dataweaver Lang -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help
On 3 Sep 2007 at 17:44, Rob Dixon wrote: Beginner wrote: I am trying to come up with a regex to squash multiple commas into one. The line I am working on looks like this: SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , There are instances of /,\s{1,},/ and /,,/ The bit that I am struggling with is finding a way to get a use a multiplier for the regex /,\s+/ but I have to be careful not to remove single entries. I guess the order of my substitutions is important here. Can anyone offer any tips please? Hey Dermot. I think just $text =~ s/,[,\s]+/,/g; Indeed Rob that works too. You've used square brackets for what I think they call 'alternation'; the next character might be a comma and a whitespace. I have always thought of square brackets as being for character classes EG: [a-z]. I associate alternation with parenthesis and the pipe /(this|that)/ perlrequick demos examples like: /[a-z]+\s+\d*/; # match a lowercase word, at least some space, and # any number of digits but I don't think I've seen examples where there is a character class like \s or \w within square brackets before. Anyway back to reading perlretut, perlop and others. Thanx, Dp. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help
Beginner wrote: Hi, Hello, I am trying to come up with a regex to squash multiple commas into one. The line I am working on looks like this: SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , There are instances of /,\s{1,},/ and /,,/ The bit that I am struggling with is finding a way to get a use a multiplier for the regex /,\s+/ but I have to be careful not to remove single entries. I guess the order of my substitutions is important here. $ perl -le' $_ = q[SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , ]; print; s/,\s*(?=,)//g; print; ' SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , SPEED OF LIGHT, LIGHT SPEED,TRAVEL,TRAVELLING, DANGER,DANGEROUS,PHYSICAL, CONCEPT,CONCEPTS, $ perl -le' $_ = q[SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , ]; print; $_ = join ,, grep /\S/, split /,/; print; ' SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , SPEED OF LIGHT, LIGHT SPEED,TRAVEL,TRAVELLING, DANGER,DANGEROUS,PHYSICAL, CONCEPT,CONCEPTS John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order.-- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
RE: Regex help
Christ That's certainly 1 way ;) -Original Message- From: John W. Krahn [mailto:[EMAIL PROTECTED] Sent: 03 September 2007 16:11 To: Perl beginners Subject: Re: Regex help Beginner wrote: Hi, Hello, I am trying to come up with a regex to squash multiple commas into one. The line I am working on looks like this: SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , There are instances of /,\s{1,},/ and /,,/ The bit that I am struggling with is finding a way to get a use a multiplier for the regex /,\s+/ but I have to be careful not to remove single entries. I guess the order of my substitutions is important here. $ perl -le' $_ = q[SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , ]; print; s/,\s*(?=,)//g; print; ' SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , SPEED OF LIGHT, LIGHT SPEED,TRAVEL,TRAVELLING, DANGER,DANGEROUS,PHYSICAL, CONCEPT,CONCEPTS, $ perl -le' $_ = q[SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , ]; print; $_ = join ,, grep /\S/, split /,/; print; ' SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , SPEED OF LIGHT, LIGHT SPEED,TRAVEL,TRAVELLING, DANGER,DANGEROUS,PHYSICAL, CONCEPT,CONCEPTS John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order.-- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ This e-mail is from the PA Group. For more information, see www.thepagroup.com. This e-mail may contain confidential information. Only the addressee is permitted to read, copy, distribute or otherwise use this email or any attachments. If you have received it in error, please contact the sender immediately. Any opinion expressed in this e-mail is personal to the sender and may not reflect the opinion of the PA Group. Any e-mail reply to this address may be subject to interception or monitoring for operational reasons or for lawful business practices. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
RE: Regex help
Think s/(\,+\s*)+/,/g; Should work It produces SPEED OF LIGHT,LIGHT SPEED,TRAVEL,TRAVELLING,DANGER,DANGEROUS,PHYSICAL,CONCEPT,CONCEPTS If that's what you want. -Original Message- From: Andrew Curry Sent: 03 September 2007 16:14 To: 'John W. Krahn'; Perl beginners Subject: RE: Regex help Christ That's certainly 1 way ;) -Original Message- From: John W. Krahn [mailto:[EMAIL PROTECTED] Sent: 03 September 2007 16:11 To: Perl beginners Subject: Re: Regex help Beginner wrote: Hi, Hello, I am trying to come up with a regex to squash multiple commas into one. The line I am working on looks like this: SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , There are instances of /,\s{1,},/ and /,,/ The bit that I am struggling with is finding a way to get a use a multiplier for the regex /,\s+/ but I have to be careful not to remove single entries. I guess the order of my substitutions is important here. $ perl -le' $_ = q[SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , ]; print; s/,\s*(?=,)//g; print; ' SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , SPEED OF LIGHT, LIGHT SPEED,TRAVEL,TRAVELLING, DANGER,DANGEROUS,PHYSICAL, CONCEPT,CONCEPTS, $ perl -le' $_ = q[SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , ]; print; $_ = join ,, grep /\S/, split /,/; print; ' SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , SPEED OF LIGHT, LIGHT SPEED,TRAVEL,TRAVELLING, DANGER,DANGEROUS,PHYSICAL, CONCEPT,CONCEPTS John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order.-- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ This e-mail is from the PA Group. For more information, see www.thepagroup.com. This e-mail may contain confidential information. Only the addressee is permitted to read, copy, distribute or otherwise use this email or any attachments. If you have received it in error, please contact the sender immediately. Any opinion expressed in this e-mail is personal to the sender and may not reflect the opinion of the PA Group. Any e-mail reply to this address may be subject to interception or monitoring for operational reasons or for lawful business practices. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
RE: Regex help
On 3 Sep 2007 at 16:15, Andrew Curry wrote: Think s/(\,+\s*)+/,/g; Should work It produces SPEED OF LIGHT,LIGHT SPEED,TRAVEL,TRAVELLING,DANGER,DANGEROUS,PHYSICAL,CONCEPT,CONCEPTS If that's what you want. Exactly what I want. Thanx, Dp. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
RE: Regex help
On 3 Sep 2007 at 16:12, Andrew Curry wrote: $ perl -le' $_ = q[SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , ]; print; s/,\s*(?=,)//g; print; ' SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , SPEED OF LIGHT, LIGHT SPEED,TRAVEL,TRAVELLING, DANGER,DANGEROUS,PHYSICAL, CONCEPT,CONCEPTS, $ perl -le' $_ = q[SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , ]; print; $_ = join ,, grep /\S/, split /,/; print; ' SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , SPEED OF LIGHT, LIGHT SPEED,TRAVEL,TRAVELLING, DANGER,DANGEROUS,PHYSICAL, CONCEPT,CONCEPTS John Okay I need to ask what's going on here. I had to use the s/,\s*(?=,)//g expression because the s/(\,+\s*)+/,/g; regex in my code snip wasn't working as it did on the text snippet I originally supplied. === code snip === while (FH) { chomp($_); s///g; s/\t/, /g; s/,\s*(?=,)//g; print \$_\\n; } == I can understand the 2nd method: A grouped, literal comma (\,), one or more times followed by a zero or more spaces. The 2nd regex reads to me like, a comma then zero or more spaces but what's that (?=,) doing? Is it referring to the preceding expression and saying if it matches up to 1 time? I can't see what the equal sign is doing either. Enlightment please. Dp. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help
Beginner wrote: On 3 Sep 2007 at 16:12, Andrew Curry wrote: Please do not attribute to Andrew Curry a post that was actually submitted by me (see my name at the end there.) TIA $ perl -le' $_ = q[SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , ]; print; s/,\s*(?=,)//g; print; ' SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , SPEED OF LIGHT, LIGHT SPEED,TRAVEL,TRAVELLING, DANGER,DANGEROUS,PHYSICAL, CONCEPT,CONCEPTS, $ perl -le' $_ = q[SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , ]; print; $_ = join ,, grep /\S/, split /,/; print; ' SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , SPEED OF LIGHT, LIGHT SPEED,TRAVEL,TRAVELLING, DANGER,DANGEROUS,PHYSICAL, CONCEPT,CONCEPTS John Okay I need to ask what's going on here. I had to use the s/,\s*(?=,)//g expression because the s/(\,+\s*)+/,/g; regex in my code snip wasn't working as it did on the text snippet I originally supplied. wasn't working is not a very good description of the problem. === code snip === while (FH) { chomp($_); Why remove the newline and then add it back at the end of the loop? s///g; It is more efficient to use transliteration to remove characters from a string: tr///d; s/\t/, /g; s/,\s*(?=,)//g; print \$_\\n; You could use different quoting so you don't have to escape the quotation marks: print qq[$_\n]; } == I can understand the 2nd method: A grouped, literal comma (\,), one or more times followed by a zero or more spaces. The 2nd regex reads to me like, a comma then zero or more spaces but what's that (?=,) doing? It is a zero-width positive look-ahead assertion. It says that a comma *must* follow the pattern but is not included as part of the pattern. Is it referring to the preceding expression and saying if it matches up to 1 time? I can't see what the equal sign is doing either. Enlightment please. perldoc perlre John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order.-- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help
Beginner wrote: I am trying to come up with a regex to squash multiple commas into one. The line I am working on looks like this: SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , There are instances of /,\s{1,},/ and /,,/ The bit that I am struggling with is finding a way to get a use a multiplier for the regex /,\s+/ but I have to be careful not to remove single entries. I guess the order of my substitutions is important here. Can anyone offer any tips please? Hey Dermot. I think just $text =~ s/,[,\s]+/,/g; is all you need. Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help
Andrew Curry wrote: Think s/(\,+\s*)+/,/g; Commas are not special in a regular expression so there is no need to escape them. You are using capturing parentheses but are not using the string captured in $1, better to use non-capturing parentheses. s/(?:,+\s*)+/,/g; A modified pattern inside a modified group is inefficient and could bomb if the string is long enough: $ perl -le' $_ = , x 100_000; s/(?:,+\s*)+/,/g; print; ' Segmentation fault (core dumped) $ perl -le' $_ = , x 100_000; s/,\s*(?=,)//g; print; ' , John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order.-- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Regex help
[ Please do not top-post. TIA ] [EMAIL PROTECTED] wrote: Hi Hello, Unless Perl is the only tool available to you in your toolbox and if you're running Linux or similar consider the tr -s command in a shell. Perl also has that: tr/,//s; perldoc perlop However if you are strictly limited to Perl then this stand regex works:- echo ,|perl -ane 's/,*/,/;print' The OP's string also included spaces after the commas. Why are you using the -a switch, which splits the current line on whitespace and stores it in the @F array, when you are not using the @F array? Why use the -n switch and 'print' instead of just using the -p switch? $ echo abc | perl -ane 's/,*/,/;print' ,abc You are using a modifier that matches zero times so you are adding commas where none existed before. John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order.-- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
RE: Regex help
Hi Unless Perl is the only tool available to you in your toolbox and if you're running Linux or similar consider the tr -s command in a shell. However if you are strictly limited to Perl then this stand regex works:- echo ,|perl -ane 's/,*/,/;print' Try it by cutting and pasting it. No doubt you'll get lots of other answers,so choose the one you like best,and can remember. -- Andrew in Edinburgh,Scotland On Mon, 3 Sep 2007, Andrew Curry wrote: Christ That's certainly 1 way ;) -Original Message- From: John W. Krahn [mailto:[EMAIL PROTECTED] Sent: 03 September 2007 16:11 To: Perl beginners Subject: Re: Regex help Beginner wrote: Hi, Hello, I am trying to come up with a regex to squash multiple commas into one. The line I am working on looks like this: SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , There are instances of /,\s{1,},/ and /,,/ The bit that I am struggling with is finding a way to get a use a multiplier for the regex /,\s+/ but I have to be careful not to remove single entries. I guess the order of my substitutions is important here. $ perl -le' $_ = q[SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , ]; print; s/,\s*(?=,)//g; print; ' SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , SPEED OF LIGHT, LIGHT SPEED,TRAVEL,TRAVELLING, DANGER,DANGEROUS,PHYSICAL, CONCEPT,CONCEPTS, $ perl -le' $_ = q[SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , ]; print; $_ = join ,, grep /\S/, split /,/; print; ' SPEED OF LIGHT, , LIGHT SPEED,TRAVEL,TRAVELLING, , DANGER,DANGEROUS,PHYSICAL, , CONCEPT,CONCEPTS, , , , , , , , , , SPEED OF LIGHT, LIGHT SPEED,TRAVEL,TRAVELLING, DANGER,DANGEROUS,PHYSICAL, CONCEPT,CONCEPTS John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order.-- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ This e-mail is from the PA Group. For more information, see www.thepagroup.com. This e-mail may contain confidential information. Only the addressee is permitted to read, copy, distribute or otherwise use this email or any attachments. If you have received it in error, please contact the sender immediately. Any opinion expressed in this e-mail is personal to the sender and may not reflect the opinion of the PA Group. Any e-mail reply to this address may be subject to interception or monitoring for operational reasons or for lawful business practices. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
On 08/21/2007 07:41 AM, Tony Heal wrote: the list is a list of files by version. I need to keep the last 5 versions. Jeff's code works fine except I am getting some empty strings at the beginning that I have not figured out. Here is what I have so far. Lines 34 and 39 are provide a print out for troubleshooting. Once I get this fixed all I need to do is shift the top five from the list and unlink the rest. #!/usr/bin/perl use warnings; use strict; opendir (REPOSITORY, '/usr/local/repository/dists/'); my @repositories = readdir (REPOSITORY); closedir (REPOSITORY); my $packageRepo; my @values; my @newValues; foreach (@repositories) { $packageRepo = $_; chomp ($packageRepo); opendir (packageREPO, /usr/local/repository/dists/$packageRepo/non-free/binary-i386); my @repoFiles = readdir (packageREPO); close (packageREPO); foreach (@repoFiles) { my $fileName = $_; chomp ($fileName); if ( /(.*)(([0-9][0-9])(-special)?\.([0-9])(-)([0-9]*))(.*)/) { push (@values, $2); } } my %h; foreach (@values) { push (@newValues, $_) unless $h{$_}++ } foreach (@newValues){print $_\n;} my @new = map { $_-[0] } sort { $b-[1] = $a-[1] } map { [$_,(split/-/)[-1]] } @newValues; print @new[0..4]\n; } Or for a line numbered version http://rafb.net/p/asqgJo27.html Tony Heal [oops, sent to the wrong list before] Sort::Maker should make short work for this task ;-) All you have to do is to make a regex to pull out the version numbers. After that, you're practically done: use strict; use warnings; require Sort::Maker; open (pkgREPO, '', 'data/versions-list.txt') or die no versions list: $!; my @versions; while (my $line = pkgREPO) { chomp $line; push @versions, [ $line, $line =~ /^(\d+)(?:-[a-z]+)?\.(\d+)-(\d+)/ ]; } close pkgREPO; my $sorter = Sort::Maker::make_sorter( 'ST', number = '$_-[1]', number = '$_-[2]', number = '$_-[3]', ); die $@ unless $sorter; my @sorted = $sorter-(@versions); print keep: $_-[0]\n for @sorted[$#sorted-4 .. $#sorted]; print delete: $_-[0]\n for @sorted[0 .. $#sorted-5]; __END__ This is the output: keep: 16.5-2 keep: 16-special.5-2 keep: 16.5-10 keep: 16.5-13 keep: 16-special.6-6 delete: 14-special.1-2 delete: 14-special.1-8 delete: 14-special.1-15 delete: 14-special.2-40 delete: 14-special.2-41 delete: 14-special.3-4 delete: 14-special.3-7 delete: 14-special.3-12 delete: 15-special.1-52 delete: 15-special.1-53 delete: 15-special.1-54 delete: 15.2-108 delete: 15.2-110 delete: 15.2-111 delete: 15.3-12 delete: 16.1-17 delete: 16.1-22 delete: 16.1-23 delete: 16.1-39 delete: 16.3-1 delete: 16.3-6 delete: 16.3-7 delete: 16.3-8 delete: 16.3-15 delete: 16-special.4-9 delete: 16-special.4-10 delete: 16.5-1 delete: 16-special.5-1 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
Jeff Pang wrote: -Original Message- From: Mr. Shawn H. Corey [EMAIL PROTECTED] Sent: Aug 21, 2007 12:32 PM To: Jeff Pang [EMAIL PROTECTED] Cc: beginners@perl.org Subject: Re: regex help Jeff Pang wrote: my @new = map { $_-[0] } sort { $b-[1] = $a-[1] } map { [$_,(split/-/)[-1]] } @arr; print @new[0..4]; Fails; this would put '15-special.3-45' before '15-special.1-51' Well,have you tested the codes then said this? I sort it based on the last number field splited by '-'.It works fine for me. The point is that only the OP can say what is significant. And s/he hasn't. -- Just my 0.0002 million dollars worth, Shawn For the things we have to learn before we can do them, we learn by doing them. Aristotle -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
RE: regex help
the list is a list of files by version. I need to keep the last 5 versions. Jeff's code works fine except I am getting some empty strings at the beginning that I have not figured out. Here is what I have so far. Lines 34 and 39 are provide a print out for troubleshooting. Once I get this fixed all I need to do is shift the top five from the list and unlink the rest. #!/usr/bin/perl use warnings; use strict; opendir (REPOSITORY, '/usr/local/repository/dists/'); my @repositories = readdir (REPOSITORY); closedir (REPOSITORY); my $packageRepo; my @values; my @newValues; foreach (@repositories) { $packageRepo = $_; chomp ($packageRepo); opendir (packageREPO, /usr/local/repository/dists/$packageRepo/non-free/binary-i386); my @repoFiles = readdir (packageREPO); close (packageREPO); foreach (@repoFiles) { my $fileName = $_; chomp ($fileName); if ( /(.*)(([0-9][0-9])(-special)?\.([0-9])(-)([0-9]*))(.*)/) { push (@values, $2); } } my %h; foreach (@values) { push (@newValues, $_) unless $h{$_}++ } foreach (@newValues){print $_\n;} my @new = map { $_-[0] } sort { $b-[1] = $a-[1] } map { [$_,(split/-/)[-1]] } @newValues; print @new[0..4]\n; } Or for a line numbered version http://rafb.net/p/asqgJo27.html Tony Heal -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
RE: regex help
Here is a sample of the versions that I am using. 16.1-17 16.1-22 16.1-23 16.1-39 16.3-1 16.3-6 16.3-7 16.3-8 16.3-15 16.5-1 16.5-2 16.5-10 16.5-13 15.3-12 15.2-108 14-special.1-2 14-special.1-8 14-special.1-15 14-special.2-40 14-special.2-41 14-special.3-4 14-special.3-7 14-special.3-12 15.2-110 15.2-111 15-special.1-52 15-special.1-53 15-special.1-54 16-special.4-9 16-special.4-10 16-special.5-1 16-special.5-2 16-special.6-6 Tony Heal Pace Systems Group, Inc. 800-624-5999 [EMAIL PROTECTED] -Original Message- From: Tony Heal [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 21, 2007 8:42 AM To: beginners@perl.org Subject: RE: regex help the list is a list of files by version. I need to keep the last 5 versions. Jeff's code works fine except I am getting some empty strings at the beginning that I have not figured out. Here is what I have so far. Lines 34 and 39 are provide a print out for troubleshooting. Once I get this fixed all I need to do is shift the top five from the list and unlink the rest. #!/usr/bin/perl use warnings; use strict; opendir (REPOSITORY, '/usr/local/repository/dists/'); my @repositories = readdir (REPOSITORY); closedir (REPOSITORY); my $packageRepo; my @values; my @newValues; foreach (@repositories) { $packageRepo = $_; chomp ($packageRepo); opendir (packageREPO, /usr/local/repository/dists/$packageRepo/non-free/binary-i386); my @repoFiles = readdir (packageREPO); close (packageREPO); foreach (@repoFiles) { my $fileName = $_; chomp ($fileName); if ( /(.*)(([0-9][0-9])(-special)?\.([0-9])(-)([0-9]*))(.*)/) { push (@values, $2); } } my %h; foreach (@values) { push (@newValues, $_) unless $h{$_}++ } foreach (@newValues){print $_\n;} my @new = map { $_-[0] } sort { $b-[1] = $a-[1] } map { [$_,(split/-/)[-1]] } @newValues; print @new[0..4]\n; } Or for a line numbered version http://rafb.net/p/asqgJo27.html Tony Heal -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
On 8/21/07, Tony Heal [EMAIL PROTECTED] wrote: Here is a sample of the versions that I am using. snip Just to clarify, you have a version string with the following format: {major}{custom tag}.{minor}-{build} and you want the list sorted by major, then minor, then build. #!/usr/bin/perl use strict; use warnings; my @versions; while (DATA) { chomp; die invalid format unless my ($major, $minor, $build) = /(\d+)(?:-.+)?\.(\d+)-(\d+)/; push @versions, [ $major, $minor, $build , $_]; } print $_-[-1]\n for sort { $a-[0] = $b-[0] or $a-[1] = $b-[1] or $a-[2] = $b-[2] } @versions; __DATA__ 16.1-17 16.1-22 16.1-23 16.1-39 16.3-1 16.3-6 16.3-7 16.3-8 16.3-15 16.5-1 16.5-2 16.5-10 16.5-13 15.3-12 15.2-108 14-special.1-2 14-special.1-8 14-special.1-15 14-special.2-40 14-special.2-41 14-special.3-4 14-special.3-7 14-special.3-12 15.2-110 15.2-111 15-special.1-52 15-special.1-53 15-special.1-54 16-special.4-9 16-special.4-10 16-special.5-1 16-special.5-2 16-special.6-6 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
RE: regex help
-Original Message- From: Tony Heal [EMAIL PROTECTED] Sent: Aug 21, 2007 9:25 PM To: [EMAIL PROTECTED], beginners@perl.org Subject: RE: regex help Here is a sample of the versions that I am using. 16.1-17 16.1-22 16.1-23 16.1-39 16.3-1 16.3-6 16.3-7 16.3-8 16.3-15 16.5-1 16.5-2 16.5-10 16.5-13 15.3-12 15.2-108 14-special.1-2 14-special.1-8 14-special.1-15 14-special.2-40 14-special.2-41 14-special.3-4 14-special.3-7 14-special.3-12 15.2-110 15.2-111 15-special.1-52 15-special.1-53 15-special.1-54 16-special.4-9 16-special.4-10 16-special.5-1 16-special.5-2 16-special.6-6 Ok try this way.It sort the version from high to low and output the first 5. use strict; use warnings; my @arr = qw(16.1-17 16.1-22 16.1-23 16.1-39 16.3-1 16.3-6 16.3-7 16.3-8 16.3-15 16.5-1 16.5-2 16.5-10 16.5-13 15.3-12 15.2-108 14-special.1-2 14-special.1-8 14-special.1-15 14-special.2-40 14-special.2-41 14-special.3-4 14-special.3-7 14-special.3-12 15.2-110 15.2-111 15-special.1-52 15-special.1-53 15-special.1-54 16-special.4-9 16-special.4-10 16-special.5-1 16-special.5-2 16-special.6-6 ); my @new = map { $_-[0] } sort { $b-[1] = $a-[1] or $b-[2] = $a-[2] or $b-[3] = $a-[3] } map { [ $_, split/\D+/ ] } @arr; print @new[0..4]; __END__ Good luck! -- Jeff Pang - [EMAIL PROTECTED] http://home.arcor.de/jeffpang/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
On 8/21/07, Jeff Pang [EMAIL PROTECTED] wrote: snip my @new = map { $_-[0] } sort { $b-[1] = $a-[1] or $b-[2] = $a-[2] or $b-[3] = $a-[3] } map { [ $_, split/\D+/ ] } @arr; snip While splitting on non-number is a nifty solution, it would break if the custom tag can contain a number (16-custom2.2-14). It is better to nail down the version number scheme and write a regex that pulls the required info from it that throws an error if a version does not match the scheme. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
On 8/21/07, Jeff Pang [EMAIL PROTECTED] wrote: -Original Message- From: Chas Owens [EMAIL PROTECTED] Sent: Aug 21, 2007 10:01 PM To: Jeff Pang [EMAIL PROTECTED] Cc: [EMAIL PROTECTED], beginners@perl.org Subject: Re: regex help On 8/21/07, Jeff Pang [EMAIL PROTECTED] wrote: snip my @new = map { $_-[0] } sort { $b-[1] = $a-[1] or $b-[2] = $a-[2] or $b-[3] = $a-[3] } map { [ $_, split/\D+/ ] } @arr; snip While splitting on non-number is a nifty solution, it would break if the custom tag can contain a number (16-custom2.2-14). It is better to nail down the version number scheme and write a regex that pulls the required info from it that throws an error if a version does not match the scheme. Have you seen this case on his datas? I have seen a sampling of his data; if that is all of the data he has then he can sort it by hand and doesn't need Perl. Experience has taught me to expect the worst from data. You need to be able to detect (if not recover from) malformed data and your split /\D/ will just silently do the wrong thing (well, there might be some undef warnings if the version were 12.4). GIGO* is fine for custom crafted one liners, but production quality code should at least make an attempt to notice if the data is bad and signal the user/admin. * Garbage In/Garbage Out -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
-Original Message- From: Chas Owens [EMAIL PROTECTED] Sent: Aug 21, 2007 10:01 PM To: Jeff Pang [EMAIL PROTECTED] Cc: [EMAIL PROTECTED], beginners@perl.org Subject: Re: regex help On 8/21/07, Jeff Pang [EMAIL PROTECTED] wrote: snip my @new = map { $_-[0] } sort { $b-[1] = $a-[1] or $b-[2] = $a-[2] or $b-[3] = $a-[3] } map { [ $_, split/\D+/ ] } @arr; snip While splitting on non-number is a nifty solution, it would break if the custom tag can contain a number (16-custom2.2-14). It is better to nail down the version number scheme and write a regex that pulls the required info from it that throws an error if a version does not match the scheme. Have you seen this case on his datas? -- Jeff Pang - [EMAIL PROTECTED] http://home.arcor.de/jeffpang/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
RE: regex help
OK I added this and I keep getting invalid format foreach (@newValues){print $_\n;} my @versions; while (@newValues) { chomp; die invalid format unless my ($major, $minor, $build) = /(\d+)(?:-.+)?\.(\d+)-(\d+)/; push @versions, [ $major, $minor, $build , $_]; } foreach (@versions){print $_\n;} } /tmp# ./trim.pl 14.20-33 14.20-34 14.18-29 14.18-33 14.18-34 14.18-35 14.18-37 14.20-27 14.20-28 14.20-29 14.20-30 14.20-31 14.20-32 14.16-30 14.16-31 invalid format at ./trim.pl line 41. (41 is the die line) sorry Chas I first sent to you and not the list. Tony Heal Pace Systems Group, Inc. 800-624-5999 [EMAIL PROTECTED] -Original Message- From: Chas Owens [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 21, 2007 9:50 AM To: [EMAIL PROTECTED] Cc: beginners@perl.org Subject: Re: regex help On 8/21/07, Tony Heal [EMAIL PROTECTED] wrote: Here is a sample of the versions that I am using. snip Just to clarify, you have a version string with the following format: {major}{custom tag}.{minor}-{build} and you want the list sorted by major, then minor, then build. #!/usr/bin/perl use strict; use warnings; my @versions; while (DATA) { chomp; die invalid format unless my ($major, $minor, $build) = /(\d+)(?:-.+)?\.(\d+)-(\d+)/; push @versions, [ $major, $minor, $build , $_]; } print $_-[-1]\n for sort { $a-[0] = $b-[0] or $a-[1] = $b-[1] or $a-[2] = $b-[2] } @versions; __DATA__ 16.1-17 16.1-22 16.1-23 16.1-39 16.3-1 16.3-6 16.3-7 16.3-8 16.3-15 16.5-1 16.5-2 16.5-10 16.5-13 15.3-12 15.2-108 14-special.1-2 14-special.1-8 14-special.1-15 14-special.2-40 14-special.2-41 14-special.3-4 14-special.3-7 14-special.3-12 15.2-110 15.2-111 15-special.1-52 15-special.1-53 15-special.1-54 16-special.4-9 16-special.4-10 16-special.5-1 16-special.5-2 16-special.6-6 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
RE: regex help
OK I added this and I keep getting invalid format foreach (@newValues){print $_\n;} my @versions; while (@newValues) { chomp; die invalid format unless my ($major, $minor, $build) = /(\d+)(?:-.+)?\.(\d+)-(\d+)/; push @versions, [ $major, $minor, $build , $_]; } foreach (@versions){print $_\n;} } /tmp# ./trim.pl 14.20-33 14.20-34 14.18-29 14.18-33 14.18-34 14.18-35 14.18-37 14.20-27 14.20-28 14.20-29 14.20-30 14.20-31 14.20-32 14.16-30 14.16-31 invalid format at ./trim.pl line 41. (41 is the die line) Tony Heal Pace Systems Group, Inc. 800-624-5999 [EMAIL PROTECTED] -Original Message- From: Chas Owens [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 21, 2007 9:50 AM To: [EMAIL PROTECTED] Cc: beginners@perl.org Subject: Re: regex help On 8/21/07, Tony Heal [EMAIL PROTECTED] wrote: Here is a sample of the versions that I am using. snip Just to clarify, you have a version string with the following format: {major}{custom tag}.{minor}-{build} and you want the list sorted by major, then minor, then build. #!/usr/bin/perl use strict; use warnings; my @versions; while (DATA) { chomp; die invalid format unless my ($major, $minor, $build) = /(\d+)(?:-.+)?\.(\d+)-(\d+)/; push @versions, [ $major, $minor, $build , $_]; } print $_-[-1]\n for sort { $a-[0] = $b-[0] or $a-[1] = $b-[1] or $a-[2] = $b-[2] } @versions; __DATA__ 16.1-17 16.1-22 16.1-23 16.1-39 16.3-1 16.3-6 16.3-7 16.3-8 16.3-15 16.5-1 16.5-2 16.5-10 16.5-13 15.3-12 15.2-108 14-special.1-2 14-special.1-8 14-special.1-15 14-special.2-40 14-special.2-41 14-special.3-4 14-special.3-7 14-special.3-12 15.2-110 15.2-111 15-special.1-52 15-special.1-53 15-special.1-54 16-special.4-9 16-special.4-10 16-special.5-1 16-special.5-2 16-special.6-6 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
On 8/21/07, Tony Heal [EMAIL PROTECTED] wrote: OK I added this and I keep getting invalid format foreach (@newValues){print $_\n;} my @versions; while (@newValues) { chomp; die invalid format unless my ($major, $minor, $build) = /(\d+)(?:-.+)?\.(\d+)-(\d+)/; push @versions, [ $major, $minor, $build , $_]; } foreach (@versions){print $_\n;} } snip That would be because the code makes no sense. My example read the a version at a time from the DATA file handle, transformed it, and pushed it onto an array, then sorted the array and printed it. Yours has all of the versions in an array and tries to loop over the array with a while loop (doesn't work to start with) and you never bother to sort the data. If you aren't reading from a file then you might as well add the first loop back onto the Schwartzian transform (map - sort - unmap). Please note that die bad format unless my ($major, $minor, $build) = /(\d+)(?:-.+)?\.(\d+)-(\d+)/; is one statement and should be indented as above. If you don't indent it looks like the die and the assignment are unrelated. If you find the style confusing you may consider using this instead my ($major, $minor, $build) = /(\d+)(?:-.+)?\.(\d+)-(\d+)/ or die bad format; #!/usr/bin/perl use strict; use warnings; #I don't know how you are getting these values my @newValues = map { chomp; $_ } DATA; print unsorted\n; print $_\n for @newValues; @newValues = #unmap to recover the original data map { $_-[0] } #sort sort { $a-[1] = $b-[1] or $a-[2] = $b-[2] or $a-[3] = $b-[3] } #map into a sortable form map { die bad format unless my ($major, $minor, $build) = /(\d+)(?:-.+)?\.(\d+)-(\d+)/; [$_, $major, $minor, $build] } @newValues; print sorted\n; print $_\n for @newValues; __DATA__ 16.1-17 16.1-22 16.1-23 16.1-39 16.3-1 16.3-6 16.3-7 16.3-8 16.3-15 16.5-1 16.5-2 16.5-10 16.5-13 15.3-12 15.2-108 14-special.1-2 14-special.1-8 14-special.1-15 14-special.2-40 14-special.2-41 14-special.3-4 14-special.3-7 14-special.3-12 15.2-110 15.2-111 15-special.1-52 15-special.1-53 15-special.1-54 16-special.4-9 16-special.4-10 16-special.5-1 16-special.5-2 16-special.6-6 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
Tony Heal am Dienstag, 21. August 2007: -Original Message- From: Chas Owens [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 21, 2007 9:50 AM To: [EMAIL PROTECTED] Cc: beginners@perl.org Subject: Re: regex help On 8/21/07, Tony Heal [EMAIL PROTECTED] wrote: Here is a sample of the versions that I am using. snip Just to clarify, you have a version string with the following format: {major}{custom tag}.{minor}-{build} and you want the list sorted by major, then minor, then build. #!/usr/bin/perl use strict; use warnings; my @versions; while (DATA) { chomp; die invalid format unless my ($major, $minor, $build) = /(\d+)(?:-.+)?\.(\d+)-(\d+)/; push @versions, [ $major, $minor, $build , $_]; } print $_-[-1]\n for sort { $a-[0] = $b-[0] or $a-[1] = $b-[1] or $a-[2] = $b-[2] } @versions; __DATA__ 16.1-17 [snip] 16-special.4-10 16-special.5-1 16-special.5-2 16-special.6-6 Hello Tony Just include the original line in the die message to see what caused it (an empty line would for example). Based on that, you can then adapt the regex. OK I added this and I keep getting invalid format foreach (@newValues){print $_\n;} my @versions; while (@newValues) { chomp; die invalid format unless die invalid format of '$_' unless my ($major, $minor, $build) = /(\d+)(?:-.+)?\.(\d+)-(\d+)/; push @versions, [ $major, $minor, $build , $_]; } foreach (@versions){print $_\n;} } /tmp# ./trim.pl 14.20-33 [snip] 14.16-31 invalid format at ./trim.pl line 41. (41 is the die line) -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
Tony Heal wrote: I have an array that will have these values. Each value is part of a file name. I need to keep the highest (numerically) 5 files and delete the rest. What is the easiest to sort the array. Break each file name into fields and sort by most significant field to least. Use the Schwartzian Transform http://en.wikipedia.org/wiki/Schwartzian_Transform to sort. See: perldoc perlretut perldoc perlre -- Just my 0.0002 million dollars worth, Shawn For the things we have to learn before we can do them, we learn by doing them. Aristotle -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
-Original Message- From: Tony Heal [EMAIL PROTECTED] Sent: Aug 21, 2007 5:50 AM To: beginners@perl.org Subject: regex help I have an array that will have these values. Each value is part of a file name. I need to keep the highest (numerically) 5 files and delete the rest. What is the easiest to sort the array. Well,it can be sorted but follow which field in the filename?the last numerical field? Just show a way, use strict; use warnings; my @arr = qw(14-special.4-32 14-special.4-32 14-special.4-33 14-special.4-33 15-special.1-51 15-special.1-51 15-special.1-52 15-special.1-52 15-special.1-52 15-special.1-53 15-special.1-53 15-special.1-53 15-special.1-54 15-special.1-54 15-special.3-44 15-special.3-44 15-special.3-45 15-special.3-45 15-special.4-4 15-special.4-4 15.2-100 15.2-100 15.2-104 15.2-104 15.2-124 15.2-124 15.2-65 15.2-65 15.2-66 15.2-66); my @new = map { $_-[0] } sort { $b-[1] = $a-[1] } map { [$_,(split/-/)[-1]] } @arr; print @new[0..4]; -- Jeff Pang - [EMAIL PROTECTED] http://home.arcor.de/jeffpang/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
Jeff Pang wrote: use strict; use warnings; my @arr = qw(14-special.4-32 14-special.4-32 14-special.4-33 14-special.4-33 15-special.1-51 15-special.1-51 15-special.1-52 15-special.1-52 15-special.1-52 15-special.1-53 15-special.1-53 15-special.1-53 15-special.1-54 15-special.1-54 15-special.3-44 15-special.3-44 15-special.3-45 15-special.3-45 15-special.4-4 15-special.4-4 15.2-100 15.2-100 15.2-104 15.2-104 15.2-124 15.2-124 15.2-65 15.2-65 15.2-66 15.2-66); my @new = map { $_-[0] } sort { $b-[1] = $a-[1] } map { [$_,(split/-/)[-1]] } @arr; print @new[0..4]; Fails; this would put '15-special.3-45' before '15-special.1-51' As I said, separate the data into fields, based on your knowledge of how to do it. (Nobody on this list knows how.) Then you can sort. -- Just my 0.0002 million dollars worth, Shawn For the things we have to learn before we can do them, we learn by doing them. Aristotle -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
-Original Message- From: Mr. Shawn H. Corey [EMAIL PROTECTED] Sent: Aug 21, 2007 12:32 PM To: Jeff Pang [EMAIL PROTECTED] Cc: beginners@perl.org Subject: Re: regex help Jeff Pang wrote: use strict; use warnings; my @arr = qw(14-special.4-32 14-special.4-32 14-special.4-33 14-special.4-33 15-special.1-51 15-special.1-51 15-special.1-52 15-special.1-52 15-special.1-52 15-special.1-53 15-special.1-53 15-special.1-53 15-special.1-54 15-special.1-54 15-special.3-44 15-special.3-44 15-special.3-45 15-special.3-45 15-special.4-4 15-special.4-4 15.2-100 15.2-100 15.2-104 15.2-104 15.2-124 15.2-124 15.2-65 15.2-65 15.2-66 15.2-66); my @new = map { $_-[0] } sort { $b-[1] = $a-[1] } map { [$_,(split/-/)[-1]] } @arr; print @new[0..4]; Fails; this would put '15-special.3-45' before '15-special.1-51' Well,have you tested the codes then said this? I sort it based on the last number field splited by '-'.It works fine for me. -- Jeff Pang - [EMAIL PROTECTED] http://home.arcor.de/jeffpang/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
Jeff Pang schreef: John W. Krahn: Tony Heal: Why doesn't this work? I want to take any leading or trailing white spaces out. perldoc -q How do I strip blank space Or generally it could be done by, $string =~ s/^\s+|\s+$//g; The g-modifier doesn't mean generally nor good. ;-) Please see the suggested perldoc text for the proper ways. I like to use: s/^\s+//, s/\s+$// for $string; but $string =~ s/^\s+//; $string =~ s/\s+$//; may be slightly faster. (like because no localization of $_) -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
RE: regex help
This works in a one-liner: $string =~ s/^\s*(.*\S)\s*$/$1/; Cheers! -Dan -Original Message- From: Dr.Ruud [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 08, 2007 2:05 PM To: beginners@perl.org Subject: Re: regex help Jeff Pang schreef: John W. Krahn: Tony Heal: Why doesn't this work? I want to take any leading or trailing white spaces out. perldoc -q How do I strip blank space Or generally it could be done by, $string =~ s/^\s+|\s+$//g; The g-modifier doesn't mean generally nor good. ;-) Please see the suggested perldoc text for the proper ways. I like to use: s/^\s+//, s/\s+$// for $string; but $string =~ s/^\s+//; $string =~ s/\s+$//; may be slightly faster. (like because no localization of $_) -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
Dan Sopher wrote: This works in a one-liner: $string =~ s/^\s*(.*\S)\s*$/$1/; Cheers! Let's compare Dan's one-liner to the solutions in the FAQ (perlfaq4): $ perl -le' for ( \nX\n, \nX, X\n, X, \n\n\n, \n, ) { $a = $b = $c = $_; $d = $a =~ s/^\s*(.*\S)\s*$/$1/; $e = $b =~ s/^\s+//; $e += $b =~ s/\s+$//; $f = $c =~ s/^\s+|\s+$//g; print Test: , ++$g, Length of original: , length( $_ ), \n, Dan\047s length: , length( $a ), on a string that was, $d ? : NOT, modified.\n, FAQ 1 length: ,length( $b ), on a string that was, $e ? : NOT, modified.\n, FAQ 2 length: ,length( $c ), on a string that was, $f ? : NOT, modified.\n; } ' Test: 1 Length of original: 3 Dan's length: 1 on a string that was modified. FAQ 1 length: 1 on a string that was modified. FAQ 2 length: 1 on a string that was modified. Test: 2 Length of original: 2 Dan's length: 1 on a string that was modified. FAQ 1 length: 1 on a string that was modified. FAQ 2 length: 1 on a string that was modified. Test: 3 Length of original: 2 Dan's length: 1 on a string that was modified. FAQ 1 length: 1 on a string that was modified. FAQ 2 length: 1 on a string that was modified. Test: 4 Length of original: 1 Dan's length: 1 on a string that was modified. FAQ 1 length: 1 on a string that was NOT modified. FAQ 2 length: 1 on a string that was NOT modified. Test: 5 Length of original: 3 Dan's length: 3 on a string that was NOT modified. FAQ 1 length: 0 on a string that was modified. FAQ 2 length: 0 on a string that was modified. Test: 6 Length of original: 1 Dan's length: 1 on a string that was NOT modified. FAQ 1 length: 0 on a string that was modified. FAQ 2 length: 0 on a string that was modified. Test: 7 Length of original: 0 Dan's length: 0 on a string that was NOT modified. FAQ 1 length: 0 on a string that was NOT modified. FAQ 2 length: 0 on a string that was NOT modified. John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order.-- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
On 8/2/07, Tony Heal [EMAIL PROTECTED] wrote: snip Why doesn't this work? I want to take any leading or trailing white spaces out. If I remove the remark it works, but I do not understand why it requires the second line $string =~ s/^(\s+)(.*)(\s+)$/$2/; snip Because (.*) matches all but the one space needed by the second (\s+). The . matches everything including the spaces. You can fix this by saying $string =~ s/^(\s+)(.*?)(\s+)$/$2/; to make (.*) match the smallest pattern (non-greedy) instead of the largest (greedy). -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony Heal wrote: Why doesn't this work? I want to take any leading or trailing white spaces out. If I remove the remark it works, but I do not understand why it requires the second line For reference, perldoc perlre and search for greedy. Basically, the .* matches as much as possible, so it gets the spaces as well. To make it not greedy, you add a ?, so $string =~ s/^\s+(.*?)\s+$/$1/; would work. Hope this helps, Ricky -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFGskSdZBKKLMyvSE4RAmoLAJ9FPUqm+9utecURkec0gMWItfKEYACgmpeS lf1qanHZefDeV5z87LMusWo= =8U17 -END PGP SIGNATURE- -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
RE: regex help
So since '?' will match the last character, group, or class 0 or 1 time the it matches the group of whatever happens to be in '.*' up to any spaces that are attached to the '$'. Is that correct? Tony Heal -Original Message- From: Chas Owens [mailto:[EMAIL PROTECTED] Sent: Thursday, August 02, 2007 4:55 PM To: [EMAIL PROTECTED] Cc: beginners@perl.org Subject: Re: regex help On 8/2/07, Tony Heal [EMAIL PROTECTED] wrote: snip Why doesn't this work? I want to take any leading or trailing white spaces out. If I remove the remark it works, but I do not understand why it requires the second line $string =~ s/^(\s+)(.*)(\s+)$/$2/; snip Because (.*) matches all but the one space needed by the second (\s+). The . matches everything including the spaces. You can fix this by saying $string =~ s/^(\s+)(.*?)(\s+)$/$2/; to make (.*) match the smallest pattern (non-greedy) instead of the largest (greedy). -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regex help
On 8/2/07, Tony Heal [EMAIL PROTECTED] wrote: So since '?' will match the last character, group, or class 0 or 1 time the it matches the group of whatever happens to be in '.*' up to any spaces that are attached to the '$'. Is that correct? snip No, the ? in .*? is not the same as the ? in [abc]? just like neither of them are the same as the ? in (?foo) The character is being reused, but the meanings are completely separate. The ? character when used with a quantifier (i.e. *, +, ?, {n}, or {n,m}) means match the smallest possible string (non-greedy). The default for those modifiers is to match the largest string possible (greedy). from perldoc perlre: The following standard quantifiers are recognized: * Match 0 or more times + Match 1 or more times ? Match 1 or 0 times {n}Match exactly n times {n,} Match at least n times {n,m} Match at least n but not more than m times snip By default, a quantified subpattern is greedy, that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a ?. Note that the meanings don't change, just the greediness: *? Match 0 or more times +? Match 1 or more times ?? Match 0 or 1 time {n}? Match exactly n times {n,}? Match at least n times {n,m}? Match at least n but not more than m times -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/