Re: trouble doing regex in file containing both ascii and binary content
Hi Greg, This list is all but dead – it may be that you and me are the only people receiving mail from it. Much better, IMO, to post these types of questions to perlmonks. Anyway ... this might help: # use strict; use warnings; my $str = \x1F\x8B\x08; print String contains: $str\n; open WR, '', 'file.bin' or die $!; binmode WR; print WR $str; close WR or die $!; undef $/; open RD, '', 'file.bin' or die $!; binmode RD; my $contents = RD; close RD or die $!; if($contents =~ /$str/){print ok 1\n} # To safeguard against presence of # metacharacters in $str: if($contents =~ /\Q$str\E/){print ok 2\n} ## Cheers, Rob From: Greg VisionInfosoft Sent: Saturday, February 15, 2014 9:41 AM To: Perl-Win32-Users@listserv.ActiveState.com Subject: trouble doing regex in file containing both ascii and binary content i cant figure out what im doing wrong here. i ran wireshark to monitor a small http client/server query/response. point of exercise is to see exactly what an ajax response looks like (as im trying to learn ajax). unfortunately, the ajax response is sent from server in 'gzip' format (not plain text). so wireshark shows two standard http headers and at the end of the stream is the binary 'gzipped' small stream. ive saved this wireshark tcp 'stream' to a file. viewing the file in hex mode, i see clearly the first three binary bytes of the gzipped stream are hex1F hex8B hex08 what i need to do next is save just the binary gzipped stream to a stand alone file, then see if i can un-gzip it to read the plain text contents. in theory, a straight forward task. i write a quick few line perl script, whereby i open the saved wireshark tcp stream file, set this input file to binary mode (so as to not change any internal binary byte values), undefine the input line seperator (to upserp the entire file into memory when read), read the file to upserp its contents into a var, do a simple pattern match of \x1F\x8B\x08, then save the matched pattern $ and what follows the match $' to a new file... (right now the script doesnt actually yet output to a file, it just dumps to screen) for reasons that elude me, the pattern match fails. i know the 3 bytes are in the file, yet the pattern match to those 3 bytes fails. any ideas? heres the small script. open(IN, $ARGV[0]) || die cant open input file; binmode(IN); undef $/; my $data = IN; if ($data =~ /\x1F\x8B\x08/) { print matched: . $ . $'; } else { print no match\n; } the contents of the wireshark stream is as follows... POST /ajax/demo_post.asp HTTP/1.1 Host: www.w3schools.com Connection: keep-alive Content-Length: 0 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36 Origin: http://www.w3schools.com Accept: */* Referer: http://www.w3schools.com/ajax/tryajax_post.htm Accept-Encoding: gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Cookie: ASPSESSIONIDAASDBBTC=BFEPJKCDLGDHEEOJIKANOEHP HTTP/1.1 200 OK Cache-Control: private,public Content-Type: text/html Content-Encoding: gzip Vary: Accept-Encoding Server: Microsoft-IIS/7.5 X-Powered-By: ASP.NET Date: Fri, 14 Feb 2014 21:03:48 GMT Content-Length: 201 .`.I.%/m.{.J.J..t...`.$..@.iG#).*..eVe]f.@..{{;.N'...?\fd.l..J...!?~|.?...V.6_..U..u...y...t./_.I.y;.f..wWG.qBo.. ..Q.www.~..h.../..h.c... note; the binary data at end is obviously not easily discerned here in ascii mode. when i open this same file in a binary editor the actual binary contents (displayed in hex) is as follows... (ive inserted an extra space to make the hex values be easily discerned). 1f 8b 08 00 00 00 00 00 04 00 ed bd 07 60 1c 49 96 25 26 2f 6d ca 7b 7f 4a f5 4a d7 e0 74 a1 08 80 60 13 24 d8 90 40 10 ec c1 88 cd e6 92 ec 1d 69 47 23 29 ab 2a 81 ca 65 56 65 5d 66 16 40 cc ed 9d bc f7 de 7b ef bd f7 de 7b ef bd f7 ba 3b 9d 4e 27 f7 df ff 3f 5c 66 64 01 6c f6 ce 4a da c9 9e 21 80 aa c8 1f 3f 7e 7c 1f 3f 22 1e af 8e de cc 8b 26 9d 56 cb 36 5f b6 e9 55 d6 a4 75 fe 8b d6 79 d3 e6 b3 74 dd 14 cb 8b b4 9d e7 e9 cb 2f 5f bf 49 17 79 3b af 66 e3 c7 77 57 47 bf 71 42 6f be b2 0d b3 f6 51 ba 77 77 77 ff ee de ce ee 7e ba ff 68 e7 de a3 fd 87 e9 cb 2f d0 f4 ff 01 a8 9f 68 15 63 00 00 00 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: trouble doing regex in file containing both ascii and binary content
I haven't had time to test my theory so I didn't respond. Since he had said text and binary my thoughts where that the regex would not match past the first linefeed and would need to be updated accordingly. On 2/15/2014 6:33 AM, sisyph...@optusnet.com.au wrote: Hi Greg, This list is all but dead -- it may be that you and me are the only people receiving mail from it. Much better, IMO, to post these types of questions to perlmonks. Anyway ... this might help: # use strict; use warnings; my $str = \x1F\x8B\x08; print String contains: $str\n; open WR, '', 'file.bin' or die $!; binmode WR; print WR $str; close WR or die $!; undef $/; open RD, '', 'file.bin' or die $!; binmode RD; my $contents = RD; close RD or die $!; if($contents =~ /$str/){print ok 1\n} # To safeguard against presence of # metacharacters in $str: if($contents =~ /\Q$str\E/){print ok 2\n} ## Cheers, Rob *From:* Greg VisionInfosoft mailto:gai...@visioninfosoft.com *Sent:* Saturday, February 15, 2014 9:41 AM *To:* Perl-Win32-Users@listserv.ActiveState.com mailto:Perl-Win32-Users@listserv.activestate.com *Subject:* trouble doing regex in file containing both ascii and binary content i cant figure out what im doing wrong here. i ran wireshark to monitor a small http client/server query/response. point of exercise is to see exactly what an ajax response looks like (as im trying to learn ajax). unfortunately, the ajax response is sent from server in 'gzip' format (not plain text). so wireshark shows two standard http headers and at the end of the stream is the binary 'gzipped' small stream. ive saved this wireshark tcp 'stream' to a file. viewing the file in hex mode, i see clearly the first three binary bytes of the gzipped stream are hex1F hex8B hex08 what i need to do next is save just the binary gzipped stream to a stand alone file, then see if i can un-gzip it to read the plain text contents. in theory, a straight forward task. i write a quick few line perl script, whereby i open the saved wireshark tcp stream file, set this input file to binary mode (so as to not change any internal binary byte values), undefine the input line seperator (to upserp the entire file into memory when read), read the file to upserp its contents into a var, do a simple pattern match of \x1F\x8B\x08, then save the matched pattern $ and what follows the match $' to a new file... (right now the script doesnt actually yet output to a file, it just dumps to screen) for reasons that elude me, the pattern match fails. i know the 3 bytes are in the file, yet the pattern match to those 3 bytes fails. any ideas? heres the small script. open(IN, $ARGV[0]) || die cant open input file; binmode(IN); undef $/; my $data = IN; if ($data =~ /\x1F\x8B\x08/) { print matched: . $ . $'; } else { print no match\n; } the contents of the wireshark stream is as follows... POST /ajax/demo_post.asp HTTP/1.1 Host: www.w3schools.com http://www.w3schools.com Connection: keep-alive Content-Length: 0 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36 Origin: http://www.w3schools.com Accept: */* Referer: http://www.w3schools.com/ajax/tryajax_post.htm Accept-Encoding: gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Cookie: ASPSESSIONIDAASDBBTC=BFEPJKCDLGDHEEOJIKANOEHP HTTP/1.1 200 OK Cache-Control: private,public Content-Type: text/html Content-Encoding: gzip Vary: Accept-Encoding Server: Microsoft-IIS/7.5 X-Powered-By: ASP.NET http://ASP.NET Date: Fri, 14 Feb 2014 21:03:48 GMT Content-Length: 201 .`.I.%/m.{.J.J..t...`.$..@.iG#).*..eVe]f.@..{{;.N'...?\fd.l..J...!?~|.?...V.6_..U..u...y...t./_.I.y;.f..wWG.qBo.. ..Q.www.~..h.../..h.c... note; the binary data at end is obviously not easily discerned here in ascii mode. when i open this same file in a binary editor the actual binary contents (displayed in hex) is as follows... (ive inserted an extra space to make the hex values be easily discerned). 1f 8b 08 00 00 00 00 00 04 00 ed bd 07 60 1c 49 96 25 26 2f 6d ca 7b 7f 4a f5 4a d7 e0 74 a1 08 80 60 13 24 d8 90 40 10 ec c1 88 cd e6 92 ec 1d 69 47 23 29 ab 2a 81 ca 65 56 65 5d 66 16 40 cc ed 9d bc f7 de 7b ef bd f7 de 7b ef bd f7 ba 3b 9d 4e 27 f7 df ff 3f 5c 66 64 01 6c f6 ce 4a da c9 9e 21 80 aa c8 1f 3f 7e 7c 1f 3f 22 1e af 8e de cc 8b 26 9d 56 cb 36 5f b6 e9 55 d6 a4 75 fe 8b d6 79 d3 e6 b3 74 dd 14 cb 8b b4 9d e7 e9 cb 2f 5f bf 49 17 79 3b af 66 e3 c7 77 57 47 bf 71 42 6f be b2 0d b3 f6 51 ba 77 77 77 ff ee de ce ee 7e ba ff 68 e7 de a3 fd 87 e9 cb 2f d0 f4 ff 01 a8 9f 68 15 63 00 00 00 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
trouble doing regex in file containing both ascii and binary content
i cant figure out what im doing wrong here. i ran wireshark to monitor a small http client/server query/response. point of exercise is to see exactly what an ajax response looks like (as im trying to learn ajax). unfortunately, the ajax response is sent from server in 'gzip' format (not plain text). so wireshark shows two standard http headers and at the end of the stream is the binary 'gzipped' small stream. ive saved this wireshark tcp 'stream' to a file. viewing the file in hex mode, i see clearly the first three binary bytes of the gzipped stream are hex1F hex8B hex08 what i need to do next is save just the binary gzipped stream to a stand alone file, then see if i can un-gzip it to read the plain text contents. in theory, a straight forward task. i write a quick few line perl script, whereby i open the saved wireshark tcp stream file, set this input file to binary mode (so as to not change any internal binary byte values), undefine the input line seperator (to upserp the entire file into memory when read), read the file to upserp its contents into a var, do a simple pattern match of \x1F\x8B\x08, then save the matched pattern $ and what follows the match $' to a new file... (right now the script doesnt actually yet output to a file, it just dumps to screen) for reasons that elude me, the pattern match fails. i know the 3 bytes are in the file, yet the pattern match to those 3 bytes fails. any ideas? heres the small script. open(IN, $ARGV[0]) || die cant open input file; binmode(IN); undef $/; my $data = IN; if ($data =~ /\x1F\x8B\x08/) { print matched: . $ . $'; } else { print no match\n; } the contents of the wireshark stream is as follows... POST /ajax/demo_post.asp HTTP/1.1 Host: www.w3schools.com Connection: keep-alive Content-Length: 0 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36 Origin: http://www.w3schools.com Accept: */* Referer: http://www.w3schools.com/ajax/tryajax_post.htm Accept-Encoding: gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Cookie: ASPSESSIONIDAASDBBTC=BFEPJKCDLGDHEEOJIKANOEHP HTTP/1.1 200 OK Cache-Control: private,public Content-Type: text/html Content-Encoding: gzip Vary: Accept-Encoding Server: Microsoft-IIS/7.5 X-Powered-By: ASP.NET Date: Fri, 14 Feb 2014 21:03:48 GMT Content-Length: 201 .`.I.%/m.{.J.J..t...`.$..@.iG#).*..eVe]f.@ ..{{;.N'...?\fd.l..J...!?~|.?...V.6_..U..u...y...t./_.I.y;.f..wWG.qBo.. ..Q.www.~..h.../..h.c... note; the binary data at end is obviously not easily discerned here in ascii mode. when i open this same file in a binary editor the actual binary contents (displayed in hex) is as follows... (ive inserted an extra space to make the hex values be easily discerned). 1f 8b 08 00 00 00 00 00 04 00 ed bd 07 60 1c 49 96 25 26 2f 6d ca 7b 7f 4a f5 4a d7 e0 74 a1 08 80 60 13 24 d8 90 40 10 ec c1 88 cd e6 92 ec 1d 69 47 23 29 ab 2a 81 ca 65 56 65 5d 66 16 40 cc ed 9d bc f7 de 7b ef bd f7 de 7b ef bd f7 ba 3b 9d 4e 27 f7 df ff 3f 5c 66 64 01 6c f6 ce 4a da c9 9e 21 80 aa c8 1f 3f 7e 7c 1f 3f 22 1e af 8e de cc 8b 26 9d 56 cb 36 5f b6 e9 55 d6 a4 75 fe 8b d6 79 d3 e6 b3 74 dd 14 cb 8b b4 9d e7 e9 cb 2f 5f bf 49 17 79 3b af 66 e3 c7 77 57 47 bf 71 42 6f be b2 0d b3 f6 51 ba 77 77 77 ff ee de ce ee 7e ba ff 68 e7 de a3 fd 87 e9 cb 2f d0 f4 ff 01 a8 9f 68 15 63 00 00 00 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re:Problem with regex
Hi Barry, On Thu, Nov 10, 2011 at 2:34 AM, Barry Brevik bbre...@stellarmicro.com wrote: Below is some test code that will be used in a larger program. In the code below I have a regular expression who's intent is to look for 1 or more characters , 1 or more characters and replace the comma with |. (the white space is just for clarity). IAC, the regex works, that is, it matches, but it only replaces the final match. I have just re-read the camel book section on regexes and have tried many variations, but apparently I'm too close to it to see what must be a simple answer. BTW, if you guys think I'm posting too often, please say so. Barry Brevik use strict; use warnings; my $csvLine = qq| col , 1 , col___'2' , col-3, col,4|; print before comma substitution: $csvLine\n\n; $csvLine =~ s/(\x22.+),(.+\x22)/$1|$2/s; print after comma substitution.: $csvLine\n\n; Tobias already gave you a solution and I also think using Text::CSV or Text::CSV_XS is way better for this task thank plain regexes, For example one day you might encounter a line that has an embedded escaped using \. Then even if your regex worked earlier this can kill it. And what if there was an | in the original string? Nevertheless let me also try to explain the issue that you had with the regex as this can come up in other situations. First, I'd probably use plain instead of \x22 as that will be probably easier to the reader to know what are you looking for. Second, the /s has probably no value at the end. That only changes the behavior of . to also match newlines.If you don't have newlines in your string (e.g. because you are processing a file line by line) then the /s has no effect. That makes this expression: $csvLine =~ s/(.+),(.+)/$1|$2/; Then, before going on you need to check what does this really match so I replaced the above with if ($csvLine =~ s/(.+),(.+)/$1|$2/s ){ print match: $1$2\n; } and got match: col , 1 , col___'2' , col-3, col4 You see, the .+ is greedy, it match from the first as much as it could. You'd be better of telling it to match as little as possible by adding an extra ? after the quantifier. if ($csvLine =~ /(.+?),(.+?)/ ){ print match: $1$2\n; } prints this: match: col 1 Finally you need to do the substitution globally, so not only once but as many times as possible: $csvLine =~ s/(.+?),(.+?)/$1|$2/g; And the output is after comma substitution.: col | 1 , col___'2' , col-3, col|4 But again, for CSV files that can have embedded, it is better to use one of the real CSV parsers. regards Gabor -- Gabor Szabo http://szabgab.com/perl_tutorial.html http://szabgab.com/perl_tutorial.html ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re:Problem with regex
Nevertheless let me also try to explain the issue that you had with the regex as this can come up in other situations. First, I'd probably use plain instead of \x22 as that will be probably easier to the reader to know what are you looking for. Wow. That is an incredible post. Yes, I've been convinced to use Text::CSV, but for some reason the ActiveState ppm does not actually install it. It complains about not being able to find some other module that it depends on. I'm amazed that you have executed the train of thought expressed in your post. I have been doing Perl for 12 years but obviously have failed to grasp some of the true power of more complicated expressions. I did intuit that I needed to use ?, but did not do it the way you did, so it did not work as expected. I appreciate the time you spent compiling your post. Sometimes I feel like I'm the only one posting questions to the list, and it alarms me that the traffic is so low... I would really regret having this list go dormant, as I have learned so much, especially from reading threads I did not post. And the people are really friendly. Barry Brevik ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Re:Problem with regex
-Original Message- From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl- win32-users-boun...@listserv.activestate.com] On Behalf Of Barry Brevik Sent: 17 November 2011 17:29 To: perl Win32-users Subject: Re:Problem with regex ... Yes, I've been convinced to use Text::CSV, but for some reason the ActiveState ppm does not actually install it. It complains about not being able to find some other module that it depends on. Don't know if it helps, but I seem to have installed Text::CSV_XS on Activestate 5.14.1 build 1401. -- Brian Raven Please consider the environment before printing this e-mail. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please advise the sender immediately by reply e-mail and delete this message and any attachments without retaining a copy. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re:Problem with regex
$csvLine =~ s/(.+?),(.+?)/$1|$2/g; For some reason this substitution does not seem to work all the time, depending on which fields have commas in them. I finally tinkered my way into this: $csvLine =~ s/([^,]+?),([^,]+?)/$1|$2/g; ...which seems to work a little better, but will not deal with spaces between fields, which are not supposed to be there anyway. Barry Brevik ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Re:Problem with regex
Don't know if it helps, but I seem to have installed Text::CSV_XS on Activestate 5.14.1 build 1401. I'm on Perl 5.8.8 because my Perl DevKit only works on that version. Barry Brevik ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Problem with regex
Hi Barry, On Thu, Nov 10, 2011 at 2:34 AM, Barry Brevik bbre...@stellarmicro.com wrote: Below is some test code that will be used in a larger program. In the code below I have a regular expression who's intent is to look for 1 or more characters , 1 or more characters and replace the comma with |. (the white space is just for clarity). IAC, the regex works, that is, it matches, but it only replaces the final match. I have just re-read the camel book section on regexes and have tried many variations, but apparently I'm too close to it to see what must be a simple answer. BTW, if you guys think I'm posting too often, please say so. Barry Brevik use strict; use warnings; my $csvLine = qq| col , 1 , col___'2' , col-3, col,4|; print before comma substitution: $csvLine\n\n; $csvLine =~ s/(\x22.+),(.+\x22)/$1|$2/s; print after comma substitution.: $csvLine\n\n; Tobias already gave you a solution and I also think using Text::CSV or Text::CSV_XS is way better for this task thank plain regexes, For example one day you might encounter a line that has an embedded escaped using \. Then even if your regex worked earlier this can kill it. And what if there was an | in the original string? Nevertheless let me also try to explain the issue that you had with the regex as this can come up in other situations. First, I'd probably use plain instead of \x22 as that will be probably easier to the reader to know what are you looking for. Second, the /s has probably no value at the end. That only changes the behavior of . to also match newlines.If you don't have newlines in your string (e.g. because you are processing a file line by line) then the /s has no effect. That makes this expression: $csvLine =~ s/(.+),(.+)/$1|$2/; Then, before going on you need to check what does this really match so I replaced the above with if ($csvLine =~ s/(.+),(.+)/$1|$2/s ){ print match: $1$2\n; } and got match: col , 1 , col___'2' , col-3, col4 You see, the .+ is greedy, it match from the first as much as it could. You'd be better of telling it to match as little as possible by adding an extra ? after the quantifier. if ($csvLine =~ /(.+?),(.+?)/ ){ print match: $1$2\n; } prints this: match: col 1 Finally you need to do the substitution globally, so not only once but as many times as possible: $csvLine =~ s/(.+?),(.+?)/$1|$2/g; And the output is after comma substitution.: col | 1 , col___'2' , col-3, col|4 But again, for CSV files that can have embedded, it is better to use one of the real CSV parsers. regards Gabor -- Gabor Szabo http://szabgab.com/perl_tutorial.html ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Problem with regex
Below is some test code that will be used in a larger program. What I am trying to do is process lines from a CSV file where some of the 'cells' have commas embedded in the (see sample code below). I might have used text::CSV but as far as I can tell that module also can not deal with embedded commas. In the code below I have a regular expression who's intent is to look for 1 or more characters , 1 or more characters and replace the comma with |. (the white space is just for clarity). IAC, the regex works, that is, it matches, but it only replaces the final match. I have just re-read the camel book section on regexes and have tried many variations, but apparently I'm too close to it to see what must be a simple answer. BTW, if you guys think I'm posting too often, please say so. Barry Brevik use strict; use warnings; my $csvLine = qq| col , 1 , col___'2' , col-3, col,4|; print before comma substitution: $csvLine\n\n; $csvLine =~ s/(\x22.+),(.+\x22)/$1|$2/s; print after comma substitution.: $csvLine\n\n; ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Problem with regex
The whitespaces around the separator characters are not allowed in strict CSV. Try this below. Cheers - Tobias use strict; use warnings; use Text::CSV; my $csv = Text::CSV-new({ allow_whitespace = 1 }); open my $fh, DATA or die Can't access DATA: $!\n; while (my $row = $csv-getline($fh)) { print join(\n,@$row),\n; } $csv-eof or $csv-error_diag(); __END__ col , 1 , col___'2' , col-3, col,4 -Original Message- From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Barry Brevik Sent: Wednesday, November 09, 2011 5:35 PM To: perl Win32-users Subject: Problem with regex Below is some test code that will be used in a larger program. What I am trying to do is process lines from a CSV file where some of the 'cells' have commas embedded in the (see sample code below). I might have used text::CSV but as far as I can tell that module also can not deal with embedded commas. In the code below I have a regular expression who's intent is to look for 1 or more characters , 1 or more characters and replace the comma with |. (the white space is just for clarity). IAC, the regex works, that is, it matches, but it only replaces the final match. I have just re-read the camel book section on regexes and have tried many variations, but apparently I'm too close to it to see what must be a simple answer. BTW, if you guys think I'm posting too often, please say so. Barry Brevik use strict; use warnings; my $csvLine = qq| col , 1 , col___'2' , col-3, col,4|; print before comma substitution: $csvLine\n\n; $csvLine =~ s/(\x22.+),(.+\x22)/$1|$2/s; print after comma substitution.: $csvLine\n\n; ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Help with regex
I am trying to truncate a string so that it is only 39 characters long. The application is a label printing routine, and the label is only long enough to print 39 characters. I tried this (and many iterations), but it returns the entire string every time. Can anyone see what I'm doing wrong, or maybe suggest a better way? use strict; use warnings; my $txt = 'This is a string that is longer than thirty nine characters used for testing.'; print \nRunning a test of grabbing the 1st 39 characters of a string.\n; print Test string.: $txt\n; $txt =~ s/^(.{1,39})/$1/; print Resulting string: $txt\n; Barry Brevik ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Help with regex
$txt =~ s/^(.{1,39}).*$/$1/; or $txt = substr($txt,0,39); --T -Original Message- From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Barry Brevik Sent: Thursday, June 30, 2011 11:49 AM To: perl-win32-users@listserv.ActiveState.com Subject: Help with regex I am trying to truncate a string so that it is only 39 characters long. The application is a label printing routine, and the label is only long enough to print 39 characters. I tried this (and many iterations), but it returns the entire string every time. Can anyone see what I'm doing wrong, or maybe suggest a better way? use strict; use warnings; my $txt = 'This is a string that is longer than thirty nine characters used for testing.'; print \nRunning a test of grabbing the 1st 39 characters of a string.\n; print Test string.: $txt\n; $txt =~ s/^(.{1,39})/$1/; print Resulting string: $txt\n; Barry Brevik ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Help with regex
Barry, : I am trying to truncate a string so that it is only 39 characters long. : The application is a label printing routine, and the label is only long : enough to print 39 characters. Wrong tool. Look for substr. Joe Joseph Discenza Senior Analyst/Software Developer 1251 N. Eddy Street, Suite 202 South Bend, IN 46617- 1478 Phone: 574.243.6040 Ext. 233 Fax: 574-243-6060 www.carletoninc.com Visit our blog at: carletoncompliance.blogspot.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyrighted. If you are not the intended recipient, please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Help with regex
Wow, thank you all for the many replies I received!! Barry Brevik ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: can a regex pattern match return the starting position of the match?
Please visit and sign this petition: International Campaign for reconstruction of Buddha's Statues, in Bamiyan http://www.thepetitionsite.com/38/international-campaign-for-reconstruction-of-buddhas-statues-in-bamiyan/ From: Conor conor.l...@gmail.com To: gai...@visioninfosoft.com Cc: perl-win32-users@listserv.activestate.com Sent: Thursday, April 14, 2011 2:08 PM Subject: Re: can a regex pattern match return the starting position of the match? Greg- This question was answered on Stack Overflow: http://stackoverflow.com/questions/87380/how-can-i-find-the-location-of-a-regex-match-in-perl brian d foy's answer seems to be the best: The built-in variables @- and @+ hold the start and end positions, respectively, of the last successful match. $-[0] and $+[0] correspond to entire pattern, while $-[N] and $+[N] correspond to the $N ($1, $2, etc.) submatches. -Conor On Thu, Apr 14, 2011 at 10:38 AM, Greg Aiken gai...@visioninfosoft.com wrote: given how smart perl is, I was thinking there must be a function within perl whereby if one does a pattern match against a scaler, that in addition to having regex being able to return such built in vars as: $` (what preceeds the match), $’ (what follows the match), $1, etc… is there a built in var that returns the position within the scalar where the match occurred? of course, if not, one may always evaluate length($`). I was just curious ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
can a regex pattern match return the starting position of the match?
given how smart perl is, I was thinking there must be a function within perl whereby if one does a pattern match against a scaler, that in addition to having regex being able to return such built in vars as: $` (what preceeds the match), $' (what follows the match), $1, etc. is there a built in var that returns the position within the scalar where the match occurred? of course, if not, one may always evaluate length($`). I was just curious ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: can a regex pattern match return the starting position of the match?
Greg- This question was answered on Stack Overflow: http://stackoverflow.com/questions/87380/how-can-i-find-the-location-of-a-regex-match-in-perl http://stackoverflow.com/questions/87380/how-can-i-find-the-location-of-a-regex-match-in-perlbrian d foy's answer seems to be the best: The built-in variables @- and @+ hold the start and end positions, respectively, of the last successful match. $-[0] and $+[0] correspond to entire pattern, while $-[N] and $+[N] correspond to the $N ($1, $2, etc.) submatches. -Conor On Thu, Apr 14, 2011 at 10:38 AM, Greg Aiken gai...@visioninfosoft.comwrote: given how smart perl is, I was thinking there must be a function within perl whereby if one does a pattern match against a scaler, that in addition to having regex being able to return such built in vars as: $` (what preceeds the match), $’ (what follows the match), $1, etc… is there a built in var that returns the position within the scalar where the match occurred? of course, if not, one may always evaluate length($`). I was just curious ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: regex like option *values*
-Original Message- From: p sena [mailto:senapati2...@yahoo.com] Sent: 05 March 2011 05:34 To: perl-win32-users@listserv.ActiveState.com; Brian Raven Subject: RE: regex like option *values* __DATA__ abc0[1-9].ctr.[pad,spd].set.in abc[01-22].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[70,001].set.in --- It should work for lists of ranges, and ranges of strings as well as numbers. Regarding incorporating into Getopt::Long, see the Tips and Tricks section of the doco. Brian, Can this solution be generalized in a way to support --option_value=abc0[1-9].ctr.[pad,spd].set.in,xxx0[2- 8].mmm.[rst,spd]. afr.org types? Means those _DATA_ lines all appear in one line separated by comma as above (instead of newline separated). Should it be efficient to do in the expand_string() or from the main while iteration just before calling expand_string. Replying back with a solution I can see. In case of such option value supplies it becomes difficlut to do the similar thing as below- GetOptions (library=s = \@libfiles); @libfiles = split(/,/,join(',',@libfiles)); Such mixed strings can be parsed and returned as a list as below. In our context, to be called from the main before the while iteration. After that this list's elems can be passed on to the expand_xxx routine(s) one by one. # Arg- A string which is the option value like #abc0[1- 9].ctr.[pad,spd].set.in,xxx0[2-8].mmm.[rst,spd].afr.org,some more values... sub parse_mix_strings { my @x = split (//, $_[0]); my $bracket_close; my $bracket_open; my @elems; my @hstrings; for (@x) { push @elems, $_; if ($_ eq '[') { $bracket_open = 1; } if ($_ eq ']') { if ($bracket_open == 1) { $bracket_close = 1; $bracket_open = 0; } } if ($_ eq ',' !$bracket_open $bracket_close) { $elems[$#elems] =~ s/,//; push @hstrings, join(,@elems); @elems = (); } } push @hstrings, join(, @elems); return@hstrings; } On *another note* leveraging use of the Getopts::Long can be this way I think ? my %list; GetOptions('list=s%' = sub { print 1 = $_[1] 2 = $_[2]\n; push(@{$list{$_[1]}}, expand_string($_[2])) }); print Elems = , scalar @{$list-{add}}, \n; # debug print , @{$list{add}}, \n; # debug skip And program can be called as - prog_name.pl --list add=abc0[1- 2].src.spd.in --list add=volvo[1-5].jeep.sch.edu Your first idea can be made simpler by choosing a different separator, as comma is already being used as a separator for the contents of your square brackets. A unique separator means that you only need to call split to get the individual strings that you want to expand. Your second idea can also be simpler. For example... my @list; GetOptions('list=s' = sub {push @list, expand_string($_[1]);}); HTH -- Brian Raven Please consider the environment before printing this e-mail. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please advise the sender immediately by reply e-mail and delete this message and any attachments without retaining a copy. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: regex like option *values*
__DATA__ abc0[1-9].ctr.[pad,spd].set.in abc[01-22].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[70,001].set.in --- It should work for lists of ranges, and ranges of strings as well as numbers. Regarding incorporating into Getopt::Long, see the Tips and Tricks section of the doco. Brian, Can this solution be generalized in a way to support --option_value=abc0[1-9].ctr.[pad,spd].set.in,xxx0[2-8].mmm.[rst,spd].afr.org types? Means those _DATA_ lines all appear in one line separated by comma as above (instead of newline separated). Should it be efficient to do in the expand_string() or from the main while iteration just before calling expand_string. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: regex like option *values*
__DATA__ abc0[1-9].ctr.[pad,spd].set.in abc[01-22].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[70,001].set.in --- It should work for lists of ranges, and ranges of strings as well as numbers. Regarding incorporating into Getopt::Long, see the Tips and Tricks section of the doco. Brian, Can this solution be generalized in a way to support --option_value=abc0[1-9].ctr.[pad,spd].set.in,xxx0[2-8].mmm.[rst,spd].afr.org types? Means those _DATA_ lines all appear in one line separated by comma as above (instead of newline separated). Should it be efficient to do in the expand_string() or from the main while iteration just before calling expand_string. Replying back with a solution I can see. In case of such option value supplies it becomes difficlut to do the similar thing as below- GetOptions (library=s = \@libfiles); @libfiles = split(/,/,join(',',@libfiles)); Such mixed strings can be parsed and returned as a list as below. In our context, to be called from the main before the while iteration. After that this list's elems can be passed on to the expand_xxx routine(s) one by one. # Arg- A string which is the option value like #abc0[1-9].ctr.[pad,spd].set.in,xxx0[2-8].mmm.[rst,spd].afr.org,some more values... sub parse_mix_strings { my @x = split (//, $_[0]); my $bracket_close; my $bracket_open; my @elems; my @hstrings; for (@x) { push @elems, $_; if ($_ eq '[') { $bracket_open = 1; } if ($_ eq ']') { if ($bracket_open == 1) { $bracket_close = 1; $bracket_open = 0; } } if ($_ eq ',' !$bracket_open $bracket_close) { $elems[$#elems] =~ s/,//; push @hstrings, join(,@elems); @elems = (); } } push @hstrings, join(, @elems); return@hstrings; } On *another note* leveraging use of the Getopts::Long can be this way I think ? my %list; GetOptions('list=s%' = sub { print 1 = $_[1] 2 = $_[2]\n; push(@{$list{$_[1]}}, expand_string($_[2])) }); print Elems = , scalar @{$list-{add}}, \n; # debug print , @{$list{add}}, \n; # debug skip And program can be called as - prog_name.pl --list add=abc0[1-2].src.spd.in --list add=volvo[1-5].jeep.sch.edu ~TIA ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: regex like option *values*
-Original Message- From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl- win32-users-boun...@listserv.activestate.com] On Behalf Of p sena Sent: 02 March 2011 17:16 To: perl-win32-users@listserv.ActiveState.com Subject: regex like option *values* Hi, I want to use option and values like:- --option_name abc0[1-9].ctr.{pad,spd}.set.in or --option_name abc[01-22].ctr.{pad,spd}.set.in or --option_name abcL{1,2,3}.ctr.{pad,spd}.set.in or --option_name abcL[1,2,3].ctr.{pad,spd}.set.in or --option_name abcL{1,2,3}.ctr.{70,001}.set.in etc possibilities. This should in fact expand those option values into the right number of values/quantities i,e; --option_name will hold multiple values. Instead of supplying values one after another I just want to club them in a regex like style. I am already using Getopt::Long. What could be best way to handle this type of passing option values? Is there any existing module for this ? I could be wrong, but I doubt that an existing module would do what you want. Generating all possible strings that match a regex is hard in the general case, if not impossible. However, if you limit the expressions you want to expand and simplify your syntax a bit, it's not too difficult. Here's a quick hack that, I think, does pretty much what you want. --- use strict; use warnings; while (DATA) { chomp; print Expanding: $_\n; my @result = expand_string($_); print $_\n for @result; } # Expand string to array of strings based on lists ranges in square # brackets. Note recursion not strictly necessary, but it simplifies # the code. sub expand_string { my $str = shift; my @result; if ($str =~ /^(.*?)\[([^]]+)\](.*)$/) { my ($pre, $post) = ($1, $3); my @bits = expand_list($2); foreach my $bit (@bits) { push @result, expand_string($pre$bit$post); } } else { push @result, $str; } return @result; } # Return array from comma separated list of strings and ranges. sub expand_list { my @vals = split /\s*,\s*/, $_[0]; my @result; foreach my $v (@vals) { if ($v =~ /^([^-]+)-([^-]+)$/) { push @result, eval '$1'..'$2'; die $@ if $@; } else { push @result, $v; } } return @result; } __DATA__ abc0[1-9].ctr.[pad,spd].set.in abc[01-22].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[70,001].set.in --- It should work for lists of ranges, and ranges of strings as well as numbers. Regarding incorporating into Getopt::Long, see the Tips and Tricks section of the doco. HTH -- Brian Raven Please consider the environment before printing this e-mail. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please advise the sender immediately by reply e-mail and delete this message and any attachments without retaining a copy. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: regex like option *values*
__DATA__ abc0[1-9].ctr.[pad,spd].set.in abc[01-22].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[70,001].set.in --- It should work for lists of ranges, and ranges of strings as well as numbers. Regarding incorporating into Getopt::Long, see the Tips and Tricks section of the doco. HTH -- Brian Raven Thanks Brian, This solution should work only for brackets irrespective of numbers or strings inside them right? The curly braces are not required it seems. This feature is not there in Getopt::Long and can this be implemented in it or it is configurable from it? Thanks. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: regex like option *values*
-Original Message- From: p sena [mailto:senapati2...@yahoo.com] Sent: 03 March 2011 15:40 To: perl-win32-users@listserv.ActiveState.com; Brian Raven Subject: RE: regex like option *values* __DATA__ abc0[1-9].ctr.[pad,spd].set.in abc[01-22].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[pad,spd].set.in abcL[1,2,3].ctr.[70,001].set.in --- It should work for lists of ranges, and ranges of strings as well as numbers. Regarding incorporating into Getopt::Long, see the Tips and Tricks section of the doco. HTH -- Brian Raven Thanks Brian, This solution should work only for brackets irrespective of numbers or strings inside them right? The curly braces are not required it seems. This feature is not there in Getopt::Long and can this be implemented in it or it is configurable from it? As I said, see 'perldoc Getopt::Long'. A small change to the suggestion in Tips and Techniques would look like.. GetOptions('option_name=s%' = sub { push(@{$list{$_[1]}}, expand_string($_[2])) }); I haven't tried it but it looks like it should work. HTH -- Brian Raven Please consider the environment before printing this e-mail. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please advise the sender immediately by reply e-mail and delete this message and any attachments without retaining a copy. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
regex like option *values*
Hi, I want to use option and values like:- --option_name abc0[1-9].ctr.{pad,spd}.set.in or --option_name abc[01-22].ctr.{pad,spd}.set.in or --option_name abcL{1,2,3}.ctr.{pad,spd}.set.in or --option_name abcL[1,2,3].ctr.{pad,spd}.set.in or --option_name abcL{1,2,3}.ctr.{70,001}.set.in etc possibilities. This should in fact expand those option values into the right number of values/quantities i,e; --option_name will hold multiple values. Instead of supplying values one after another I just want to club them in a regex like style. I am already using Getopt::Long. What could be best way to handle this type of passing option values? Is there any existing module for this ? ~TIA ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Re: Perl Regex
Does this have to be hard coded in to the script? Just wondering since I have been kinda following this thread.Feb 26, 2010 01:56:02 PM, perl-win32-users-boun...@listserv.activestate.com wrote: It looks like what u want to do is attribute folding. That's when u take anested XML tag and make it an attribute of an enclosing tag. Ur doingsomething slightly different which is merging equal depth tags. The rightway to do this is with an XML parser. Look into XML::Simple to get started.U would read in the XML to a hash, manipulate the data in the hash, and thenwrite out a new XML file.Regex can do this in a degenerate case but it becomes unmanageable fast.But since u asked$xml =~s{(\s*)([^]*)\s*([^]*)eId(\s*)}{$1<INDEX-ENTRYpages="$3"$2$4}sg;HTHAt 09:25 PM 2/26/2010 +0530, Kprasad wrote:Hi AllWhat will be the perfect Regular _expression_ to convert below mentioned'Search Text' to 'Replacement Text' while 'Single Line' option is ON.When I use below mentioned Regex<index-entry(?:[^>]+)?((?!\/index-entry).*?)\s*([0-9]+)And replaces wronglyarousal disorders<SEEhref="" label="see"disorders of arousal.Search Text:APOE e4 variant 18arousal disorders label="see"disorders of arousalarterial blood gas tests 32asthma 28--9, 295Correct Replacement Text should be:APOE e4 variantarousal disorders label="see"disorders of arousalarterial blood gas testsasthma--REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--"...ne cede malis"0100___Perl-Win32-Users mailing listPerl-Win32-Users@listserv.ActiveState.comTo unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Perl Regex
Yes, you could use an XML parser to do the job described below but this case is pretty simple. Here's my offering leaving out the reading/writing of the files. -- my $s = EOF; index-item index-entryAPOE e4 variant/index-entry pageId18/pageId /index-item index-item index-entryarousal disorders/index-entry see href=c-86679-1 label=seedisorders of arousal/see /index-item index-item index-entryarterial blood gas tests/index-entry pageId32/pageId /index-item index-item index-entryasthma/index-entry pageId28--9, 295/pageId /index-item EOF $s =~ s{index-entry(.*?)/index-entry\s*pageId(.*?)/pageId} {index-entry pages=$2$1/index-entry}g; print $s; -- You could replace the two .*? with [^]* if you wanted to be more precise but it looks more confusing. Jon == original query Hi All What will be the perfect Regular Expression to convert below mentioned 'Search Text' to 'Replacement Text' while 'Single Line' option is ON. When I use below mentioned Regex index-entry(?:[^]+)?((?!\/index-entry).*?)/index-entry \s*pageId([0-9]+)/pageId And replaces wrongly index-entry pages=32arousal disorders/index-entrysee href=c-86679-1 label=seedisorders of arousal/see /index-item . Search Text: index-item index-entryAPOE e4 variant/index-entry pageId18/pageId /index-item index-item index-entryarousal disorders/index-entry see href=c-86679-1 label=seedisorders of arousal/see /index-item index-item index-entryarterial blood gas tests/index-entry pageId32/pageId /index-item index-item index-entryasthma/index-entry pageId28--9, 295/pageId /index-item Correct Replacement Text should be: index-item index-entry pages=18APOE e4 variant/index-entry /index-item index-item index-entryarousal disorders/index-entry see href=c-86679-1 label=seedisorders of arousal/see /index-item index-item index-entry pages=32arterial blood gas tests/index-entry /index-item index-item index-entry pages=28--29,295asthma/index-entry /index-item Kanhaiya ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Perl Regex
Hi All What will be the perfect Regular Expression to convert below mentioned 'Search Text' to 'Replacement Text' while 'Single Line' option is ON. When I use below mentioned Regex index-entry(?:[^]+)?((?!\/index-entry).*?)/index-entry\s*pageId([0-9]+)/pageId And replaces wrongly index-entry pages=32arousal disorders/index-entrysee href=c-86679-1 label=seedisorders of arousal/see /index-item . Search Text: index-item index-entryAPOE e4 variant/index-entry pageId18/pageId /index-item index-item index-entryarousal disorders/index-entry see href=c-86679-1 label=seedisorders of arousal/see /index-item index-item index-entryarterial blood gas tests/index-entry pageId32/pageId /index-item index-item index-entryasthma/index-entry pageId28--9, 295/pageId /index-item Correct Replacement Text should be: index-item index-entry pages=18APOE e4 variant/index-entry /index-item index-item index-entryarousal disorders/index-entry see href=c-86679-1 label=seedisorders of arousal/see /index-item index-item index-entry pages=32arterial blood gas tests/index-entry /index-item index-item index-entry pages=28--29,295asthma/index-entry /index-item Kanhaiya___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Perl Regex
From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Kprasad Sent: 26 February 2010 15:56 To: perl-win32-users@listserv.ActiveState.com Subject: Perl Regex Hi All What will be the perfect Regular Expression to convert below mentioned 'Search Text' to 'Replacement Text' while 'Single Line' option is ON. When I use below mentioned Regex index-entry(?:[^]+)?((?!\/index-entry).*?)/index-entry\s*pageId ([0-9]+)/pageId And replaces wrongly I think it is going to be hard to be of much help. Mostly because you don't show us any Perl. First, a regular expression can't change anything, it can only match. Second, I find it easier to work out what is going on with non-trivial regular expressions if I use the 'x' switch, which allows me to break the RE over multiple lines, and include comments. Particularly useful with the 'qr' quoting operator. Your RE, for example, might look like this. my $re=qr{index-entry(?:[^]+)? ((?!\/index-entry).*?) /index-entry \s* pageId ([0-9]+) /pageId }x; However, as you don't provide any information on how that RE is used, its going to be difficult to say what might be going wrong. If you could provide a small example script, that we could cut paste run, it would make it much easier. Finally, your data looks a lot like XML. A dedicated parser will generally do a more reliable job of parsing XML that regular expressions, even Perl regular expressions. HTH -- Brian Raven Please consider the environment before printing this email. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please advise the sender immediately by reply e-mail and delete this message and any attachments without retaining a copy. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Perl Regex
It looks like what u want to do is attribute folding. That's when u take a nested XML tag and make it an attribute of an enclosing tag. Ur doing something slightly different which is merging equal depth tags. The right way to do this is with an XML parser. Look into XML::Simple to get started. U would read in the XML to a hash, manipulate the data in the hash, and then write out a new XML file. Regex can do this in a degenerate case but it becomes unmanageable fast. But since u asked $xml =~ s{index-item(\s*)index-entry([^]*)/index-entry\s*pageId([^]*)/pag eId(\s*)/index-item}{index-item$1index-entry pages=$3$2/index-entry$4/index-item}sg; HTH At 09:25 PM 2/26/2010 +0530, Kprasad wrote: Hi All What will be the perfect Regular Expression to convert below mentioned 'Search Text' to 'Replacement Text' while 'Single Line' option is ON. When I use below mentioned Regex index-entry(?:[^]+)?((?!\/index-entry).*?)/index-entry\s*pageId([0 -9]+)/pageId And replaces wrongly index-entry pages=32arousal disorders/index-entrysee href=c-86679-1 label=seedisorders of arousal/see /index-item . Search Text: index-item index-entryAPOE e4 variant/index-entry pageId18/pageId /index-item index-item index-entryarousal disorders/index-entry see href=c-86679-1 label=seedisorders of arousal/see /index-item index-item index-entryarterial blood gas tests/index-entry pageId32/pageId /index-item index-item index-entryasthma/index-entry pageId28--9, 295/pageId /index-item Correct Replacement Text should be: index-item index-entry pages=18APOE e4 variant/index-entry /index-item index-item index-entryarousal disorders/index-entry see href=c-86679-1 label=seedisorders of arousal/see /index-item index-item index-entry pages=32arterial blood gas tests/index-entry /index-item index-item index-entry pages=28--29,295asthma/index-entry /index-item -- REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =-- ...ne cede malis 0100 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Perl Regex
Here is the chunk of code which I used to perform this task: open(XML, $ARGV[0]) or die Can not open $ARGV[0]: $!; my $xmltext; { local $/ = undef; $xmltext=XML; } close(XML); while($xmltext=~ /index-entry(?:[^]+)?(?:.*?)\/index-entry(?:[^\n]*?)pageId([^]+)\/pageId/is) { $page=$2; $page=~ s/ *\n+\t+/ /g; $page=~ s/, /,/g; $xmltext=~ s|index-entry(?:[^]+)?(.*?)/index-entry(?:[^\n]*?)pageId[^]+/pageId|index-entry chid=$1 pages=$page$2/index-entry|s } $xmltext=~ s/index-entry chid=/index-entry id=/; open(XMLOUT, $localpath/$xmlfile\_final.xml) or die Can not open $localpath/$xmlfile\_final.xml: $!; print XMLOUT $xmltext; close(XMLOUT); Thanks Kanhaiya - Original Message - From: Brian Raven bra...@nyx.com To: perl-win32-users@listserv.ActiveState.com Sent: Friday, February 26, 2010 10:22 PM Subject: RE: Perl Regex From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Kprasad Sent: 26 February 2010 15:56 To: perl-win32-users@listserv.ActiveState.com Subject: Perl Regex Hi All What will be the perfect Regular Expression to convert below mentioned 'Search Text' to 'Replacement Text' while 'Single Line' option is ON. When I use below mentioned Regex index-entry(?:[^]+)?((?!\/index-entry).*?)/index-entry\s*pageId ([0-9]+)/pageId And replaces wrongly I think it is going to be hard to be of much help. Mostly because you don't show us any Perl. First, a regular expression can't change anything, it can only match. Second, I find it easier to work out what is going on with non-trivial regular expressions if I use the 'x' switch, which allows me to break the RE over multiple lines, and include comments. Particularly useful with the 'qr' quoting operator. Your RE, for example, might look like this. my $re=qr{index-entry(?:[^]+)? ((?!\/index-entry).*?) /index-entry \s* pageId ([0-9]+) /pageId }x; However, as you don't provide any information on how that RE is used, its going to be difficult to say what might be going wrong. If you could provide a small example script, that we could cut paste run, it would make it much easier. Finally, your data looks a lot like XML. A dedicated parser will generally do a more reliable job of parsing XML that regular expressions, even Perl regular expressions. HTH -- Brian Raven Please consider the environment before printing this email. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please advise the sender immediately by reply e-mail and delete this message and any attachments without retaining a copy. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
More efficient regex
Gurus, In the sample below, I'm checking $foo for all caps or all lowercase. Is there a more efficient regex method? - Chris $foo='APPLE JONES PARKER'; if(($foo!~/[A-Z]/)or($foo!~/[a-z]/)){ $foo=title_case($foo); } print $foo.\n; sub title_case{ my($string) = @_; my @exception_words = ('A', 'The', 'If', 'Is', 'It', 'Of', 'Our', 'An','On', 'In', 'But', 'With', 'Has', 'Had', 'Have'); my @exception_stuff = ('N','S','E','W','NE','NW','SE','SW','PO','BOX'); $string =~ s/([\w']+)/\u\L$1/g; foreach(@exception_words){$string =~ s/\b$_\b/lc($_)/ge;} # Make Exception Words LC foreach(@exception_stuff){$string =~ s/\b$_\b/$_/gei;} # Make Exception Stuff Correct Case $string =~ s/(.)/\u$1/; # Uppercase the first letter return $string; } ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: More efficient regex
I'm guessing that u want to cannocalize the capitalization of string words right? Like BOB JONES - Bob Jones. There is a faster way to check for mixed casedness. %tolowers = map {$_, 1} ('A', 'The', 'If', 'Is', 'It', 'Of', 'Our', 'An','On', 'In', 'But', 'With', 'Has', 'Had', 'Have'); %touppers = map {$_, 1} ('N','S','E','W','NE','NW','SE','SW','PO','BOX'); $uppers = $text =~ tr/A-Z/A-Z/; #count uppercase letters $lowers = $text =~ tr/a-z/a-z/; #count lowercase letters if ($uppers and not $lowers) { #all upper case fixcase($text); } elsif ($lowers and not $uppers) { #all lower case fixcase($text); } sub fixcase { my $text = $_[0]; my @text = map {ucfirst(lc($_))} split / /, $text; foreach $i (@text) { $tolowers{$i} and $i = lc $i; } foreach $i (@text) { $touppers{uc $i} and $i = uc $i; } $text = join , @text; # do whatever else return $text; } That should do it and be about as efficient as possible. :) If u have to deal with sentences then u'll need a few more lines to deal with periods and commas. These O'Reilly gems are useful too. Finding all-caps words @capwords = m/(\b[^\Wa-z0-9_]+\b)/g; Finding all-lowercase words @lowords = m/(\b[^\WA-Z0-9_]+\b)/g; Finding initial-caps word @icwords = m/(\b[^\Wa-z0-9_][^\WA-Z0-9_]*\b)/; At 12:12 PM 2/28/2007 -0500, Chris O wrote: In the sample below, I'm checking $foo for all caps or all lowercase. Is there a more efficient regex method? $foo='APPLE JONES PARKER'; if(($foo!~/[A-Z]/)or($foo!~/[a-z]/)){ $foo=title_case($foo); } print $foo.\n; sub title_case{ my($string) = @_; my @exception_words = ('A', 'The', 'If', 'Is', 'It', 'Of', 'Our', 'An','On', 'In', 'But', 'With', 'Has', 'Had', 'Have'); my @exception_stuff = ('N','S','E','W','NE','NW','SE','SW','PO','BOX'); $string =~ s/([\w']+)/\u\L$1/g; foreach(@exception_words){$string =~ s/\b$_\b/lc($_)/ge;} # Make Exception Words LC foreach(@exception_stuff){$string =~ s/\b$_\b/$_/gei;} # Make Exception Stuff Correct Case $string =~ s/(.)/\u$1/; # Uppercase the first letter return $string; } -- REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =-- ...ne cede malis 0100 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Regex: Remove A HREF tag
Hi,I'm trying to use regex to strip an HTML text from the A HREF tag... but keep the http link intact.Here is an example:Here is the dirty text:$dirty = Check A HREF= "" style="color: rgb(255, 0, 0); font-weight: bold;">\http://www.opengroup.org/cde/\ TARGET=\_blank\The Open Group's Web site/A for updates. P For Solaris/Sun OS, use A HREF="" style="color: rgb(255, 0, 0); font-weight: bold;">\http://www.securityfocus.com/archive/1/358426 \ TARGET=\_blank\ this workaround/A for protecting the 'dtlogin' service from remote access /A. Sun also released a patch available at A HREF="" style="color: rgb(255, 0, 0); font-weight: bold;">\http://su nsolve.sun.com/search/document.do?assetkey=1-26-57539-1\ TARGET= \_blank\Sun Alert 57539/A.;** '\' were added to regard the (dobule qoutes) as text. here is how the text should look like:$clean = Check [http://www.opengroup.org/cde/] {The Open Group's Web site} for updates. For Solaris/Sun OS, use [http://www.securityfocus.com/archive/1/358426] this workaround for protecting the 'dtlogin' service from remote access. Sun also released a patch available at [http://sunsolve.sun.com/search/document.do?assetkey=1-26-57539-1] {Sun Alert 57539}; i'm using this to remove any HTML tags, but it also removes the HREF tags: # remove all HTML TAGS $solution =~ s/[^]*//gs; # remove all escape chars like gt quot $solution =~ s/gt;//gs; $solution =~ s/quot;//gs; Can you help?-- Eyal Edri | System Security Engineer| [EMAIL PROTECTED] Communication. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex: Remove A HREF tag
First convert the links then strip the html tags. $text =~ s/a .*?href=?(.+?)?.*?(.+?)/a/[$1] {$2}/ig; -- REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =-- ...ne cede malis 0100 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Perl -pi -e 'regex'
I've had some trouble with a commandline syntax for a string search and replace using a perl command line. My cmd line was: perl -pi -e 's!\xae!\\#169!g' list of files This did not replace the occurace hex EA The script for what that above cmd line should compile into (according to the Camel Book) is: __BEGIN__ #!perl.exe $extension = '*'; LINE: while () { if ($ARGV ne $oldargv) { if ($extension !~ /\*/) { $backup = $ARGV . $extension; } else { ($backup = $extension) =~ s/\*/$ARGV/g; } rename($ARGV, $backup); open(ARGVOUT, $ARGV); select(ARGVOUT); $oldargv = $ARGV; } s/\xae/\\#169\;/g; } continue { print; # this prints to original filename } select(STDOUT); __END__ Running this actual script works, but not the commandline version. Did I not escape the commandline properly? ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Perl -pi -e 'regex'
Adam R. Frielink wrote: I've had some trouble with a commandline syntax for a string search and replace using a perl command line. My cmd line was: perl -pi -e 's!\xae!\\#169!g' list of files This did not replace the occurace hex EA The script for what that above cmd line should compile into (according to the Camel Book) is: ... s/\xae/\\#169\;/g; ... Running this actual script works, but not the commandline version. Did I not escape the commandline properly? Looks like you forgot the \; in the command line. -- Lyle Kopnicky Software Project Engineer Veicon Technology, Inc. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Perl -pi -e 'regex'
Adam R. Frielink wrote: I've had some trouble with a commandline syntax for a string search and replace using a perl command line. My cmd line was: perl -pi -e 's!\xae!\\#169!g' list of files I tried this on tcsh and cmd.exe. cmd.exe needs s instead of 's and tcsh doesn't like ! and Perl wants a backup ext : perl -pi.bak -e s{\xae}{#169}g foo This did not replace the occurace hex EA You mean AE ? The script for what that above cmd line should compile into (according to the Camel Book) is: __BEGIN__ #!perl.exe $extension = '*'; LINE: while () { if ($ARGV ne $oldargv) { if ($extension !~ /\*/) { $backup = $ARGV . $extension; } else { ($backup = $extension) =~ s/\*/$ARGV/g; } rename($ARGV, $backup); open(ARGVOUT, $ARGV); select(ARGVOUT); $oldargv = $ARGV; } s/\xae/\\#169\;/g; } continue { print; # this prints to original filename } select(STDOUT); __END__ Running this actual script works, but not the commandline version. Did I not escape the commandline properly? ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
AW: Problem with regex
use strict; use warnings; my $Data = 'Hello, i am a litte String.$ Please format me.$$$ I am the end of the String.$$ And i am the last!'; $Data =~ s/([^\$]*)\${3,3}([^\$]+)/$1\br\\br\$2/gm; $Data =~ s/([^\$]*)\${2,2}([^\$]+)/$1\p\$2/gm; $Data =~ s/([^\$]*)\${1,1}([^\$]+)/$1\br\$2/gm; print Data: $Data \n; ___END___ Notice, I change the double quotes to single quotes for $Data. For me, the regex is clear. But if not for you, I can explain. There are maybe some better solution, this is just a quick one. Hello, First of all, many thanks for our quick and helpfully replies. I tried Karl-Heinz's solution and it works very good. Karl-Heinz: Yes the regex is clear to me, the solution with $1 $2 was a good idea regards Holgi p.s. next time i should first take the Owls with me in the bath tub ;-) ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Problem with regex
Hello, under Windows with ActiveState Perl i have a strange problem with a regex: Assuming the following String: my $Data = Hello, i am a litte String.$ Please format me.$$$ I am the end of the String.$$ And i am the last! The regex should replace $ with the string br, $$ with p and $$$ with brbr (please don't think about the why) If tried to use the following: $data =~ s/\$\$\$/brbr/gm; #should catch every occurrence of $$$ $data =~ s/\$\$/p/gm; #should catch $$ $data =~ s/\$/br/gm; #the rest So data should look after the first regex: Hello, i am a litte String.$Please format me.brbrI am the end of the String.$$And i am the last! And after the second: Hello, i am a litte String.$Please format me.brbrI am the end of the String.pAnd i am the last! And the last: Hello, i am a litte String.brPlease format me.brbrI am the end of the String.pAnd i am the last! But all regexes i tried (the one above are only one try) failed! When i print out the string it looks like: Hello, i am a litte String. Please format me. I am the end of the String.3398 And i am the last! Where the number after String. differs between every run. Can someone help me ? With regars Holger ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Problem with regex
Hello, my $Data = Hello, i am a litte String.$ Please format me.$$$ I am the end of the String.$$ And i am the last! The regex should replace $ with the string br, $$ with p and $$$ with brbr (please don't think about the why) If tried to use the following: $data =~ s/\$\$\$/brbr/gm; #should catch every occurrence of $$$ $data =~ s/\$\$/p/gm; #should catch $$ $data =~ s/\$/br/gm; #the rest So data should look after the first regex: Hello, i am a litte String.$Please format me.brbrI am the end of the String.$$And i am the last! And after the second: Hello, i am a litte String.$Please format me.brbrI am the end of the String.pAnd i am the last! And the last: Hello, i am a litte String.brPlease format me.brbrI am the end of the String.pAnd i am the last! But all regexes i tried (the one above are only one try) failed! When i print out the string it looks like: Hello, i am a litte String. Please format me. I am the end of the String.3398 And i am the last! Where the number after String. differs between every run. Can someone help me ? This works at least on my machine: use strict; use warnings; my $Data = 'Hello, i am a litte String.$ Please format me.$$$ I am the end of the String.$$ And i am the last!'; $Data =~ s/([^\$]*)\${3,3}([^\$]+)/$1\br\\br\$2/gm; $Data =~ s/([^\$]*)\${2,2}([^\$]+)/$1\p\$2/gm; $Data =~ s/([^\$]*)\${1,1}([^\$]+)/$1\br\$2/gm; print Data: $Data \n; ___END___ Notice, I change the double quotes to single quotes for $Data. For me, the regex is clear. But if not for you, I can explain. There are maybe some better solution, this is just a quick one. Regards Karl-Heinz ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Problem with regex
Holger, Actually $ is a special character in string in perl. So, if the $ is there in the input, you will have to always write it with the leading escape character. So, make your input will be like this, my $data = Hello, i am a litte String.$ Please format me.$$$ I am the end of the String.$$ And i am the last!; It will solve your problem. Thanks, Seema GPCT|TDDS|AIS|SPCM3 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Holger Wöhle Sent: Friday, May 12, 2006 6:09 PM To: perl-win32-users@listserv.ActiveState.com Subject: Problem with regex Hello, under Windows with ActiveState Perl i have a strange problem with a regex: Assuming the following String: my $Data = Hello, i am a litte String.$ Please format me.$$$ I am the end of the String.$$ And i am the last! The regex should replace $ with the string br, $$ with p and $$$ with brbr (please don't think about the why) If tried to use the following: $data =~ s/\$\$\$/brbr/gm; #should catch every occurrence of $$$ $data =~ s/\$\$/p/gm; #should catch $$ $data =~ s/\$/br/gm; #the rest So data should look after the first regex: Hello, i am a litte String.$Please format me.brbrI am the end of the String.$$And i am the last! And after the second: Hello, i am a litte String.$Please format me.brbrI am the end of the String.pAnd i am the last! And the last: Hello, i am a litte String.brPlease format me.brbrI am the end of the String.pAnd i am the last! But all regexes i tried (the one above are only one try) failed! When i print out the string it looks like: Hello, i am a litte String. Please format me. I am the end of the String.3398 And i am the last! Where the number after String. differs between every run. Can someone help me ? With regars Holger ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs If you are not an intended recipient of this e-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute it. Click here for important additional terms relating to this e-mail. http://www.ml.com/email_terms/ ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Problem with regex
Holger, This worked for me note that you need to escape the $ characters in your string. The 3398 numberis actually the PID of the perl process returned from the special variable $$ ... since you didn't escape the $ characters.. my $Data = "" i am a litte String.\$ Please format me.\$\$\$ I am the endof the String.\$\$ And i am the last!; $Data =~ s/[\$]{3}/brbr/;$Data =~ s/[\$]{2}/p/;$Data =~ s/\$/br/; print $Data .\n; Hope that helps... Andy Speagle - On 5/12/06, Holger Wöhle [EMAIL PROTECTED] wrote: Hello,under Windows with ActiveState Perl i have a strange problem with a regex:Assuming the following String: my $Data = "" i am a litte String.$ Please format me.$$$ I am the endof the String.$$ And i am the last!The regex should replace $ with the string br, $$ with p and $$$ with brbr (please don't think about the why)If tried to use the following:$data =~ s/\$\$\$/brbr/gm; #should catch every occurrence of data =~ s/\$\$/p/gm; #should catch $$ $data =~ s/\$/br/gm; #the restSo data should look after the first regex:Hello, i am a litte String.$Please format me.brbrI am the end of theString.$$And i am the last!And after the second: Hello, i am a litte String.$Please format me.brbrI am the end of theString.pAnd i am the last!And the last:Hello, i am a litte String.brPlease format me.brbrI am the end of the String.pAnd i am the last!But all regexes i tried (the one above are only one try) failed! When iprint out the string it looks like:Hello, i am a litte String. Please format me. I am the end of the String.3398 And i am the last!Where the number after String. differs between every run.Can someone help me ?With regarsHolger___Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.comTo unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Problem with regex
At 09:47 AM 5/12/2006, Yekhande, Seema \(MLITS\) wrote: Holger, Actually $ is a special character in string in perl. So, if the $ is there in the input, you will have to always write it with the leading escape character. So, make your input will be like this, my $data = Hello, i am a litte String.$ Please format me.$$$ I am the end of the String.$$ And i am the last!; It will solve your problem. $ is only special in strings with double quote marks ( ) around them. I think you meant to say: my $data = Hello, i am a little String.\$ Please format me.\$\$\$ I am the end of the String.\$\$ And i am the last!; That works, but, you can also use: my $data = 'Hello, i am a little String.$ Please format me.$$$ I am the end of the String.$$ And i am the last!'; (Note the type of quote mark used) If you were to print out the original string data like this: my $data = Hello, i am a litte String.$ Please format me.$$$ I am the end of the String.$$ And i am the last!; print($data\n); you would get this: Hello, i am a litte String. format me. I am the end of the String.1896 And i am the last! i.e., the original string did not have any '$' characters in it at all. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex Needed
Unpack is even faster, for fixed-format strings. dZ. On Mar 24, 2006, at 22:19, Chris Wagner wrote: At 10:38 AM 3/24/2006 -0700, Paul Rousseau wrote: I am looking for help on a regex that examines strings such as xxxN yyy sssNNN xxxN yyyNyyy sss xxxN yyyNyyy ssN and returns only the sss part? N is always a numeral, and s is always alphabetic. Do u have to examine those as fixed strings or as variable strings? Meaning do u know ahead of time which format ur looking at. If u don't know the format ahead of time then u should use the regex. But if u do know the format ahead of time (like it never changes for one application) then u shouldn't use a regex. Using substr will be faster. xxxN yyy sssNNN $s = substr $string, 13, 3; xxxN yyyNyyy sss $s = substr $string, 13, 3; xxxN yyyNyyy ssN $s = substr $string, 13, 6; #don't know what format $string will be $s = $string =~ m/\S+ \S+ ([a-z])+/i; -- REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =-- ...ne cede malis 0100 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Regex Needed
Hello, I am looking for help on a regex that examines strings such as xxxN yyy sssNNN xxxN yyyNyyy sss xxxN yyyNyyy ssN and returns only the sss part? N is always a numeral, and s is always alphabetic. Here is what I have so far as an example. I believe there is an eloquent way to do this in a single regex. my ( $string, $prefix ); $string = MBH1 WELL PIT050; ($prefix) = $string =~ # I want $prefix to equal PIT Thank you. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Regex Needed
Paul Rousseau wrote, on Friday, March 24, 2006 12:38 PM :I am looking for help on a regex that examines strings such as : : xxxN yyy sssNNN : xxxN yyyNyyy sss : xxxN yyyNyyy ssN : : and returns only the sss part? N is always a numeral, and s : is always alphabetic. Does /.*(\d+)/ do what you want? Or is there more to the string after what you've shown? Good luck, Joe ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Regex Needed
[EMAIL PROTECTED] wrote: Hello, I am looking for help on a regex that examines strings such as xxxN yyy sssNNN xxxN yyyNyyy sss xxxN yyyNyyy ssN and returns only the sss part? N is always a numeral, and s is always alphabetic. Here is what I have so far as an example. I believe there is an eloquent way to do this in a single regex. my ( $string, $prefix ); $string = MBH1 WELL PIT050; ($prefix) = $string =~ # I want $prefix to equal PIT if it is really of that format then /\s(\D+)\d+$/ is one shot which looks for a space followed by NON Numeric and then alpha and then end of line or data. Wags ;) Thank you. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs *** This message contains information that is confidential and proprietary to FedEx Freight or its affiliates. It is intended only for the recipient named and for the express purpose(s) described therein. Any other use is prohibited. *** ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Regex Needed
$string = MBH1 WELL PIT050; $string =~ s/.* (.*?)\d+/\1/; # Questionmark makes it non-greedy ($prefix) = $string; # Didn't figure out how to do ($prefix) = $string =~ print $prefix; ; ** Hello, I am looking for help on a regex that examines strings such as xxxN yyy sssNNN xxxN yyyNyyy sss xxxN yyyNyyy ssN and returns only the sss part? N is always a numeral, and s is always alphabetic. Here is what I have so far as an example. I believe there is an eloquent way to do this in a single regex. my ( $string, $prefix ); $string = MBH1 WELL PIT050; ($prefix) = $string =~ # I want $prefix to equal PIT Thank you. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex Needed
Paul, Give this a shot: /^\w+\s+\w+\s+([A-Za-z]+)\d+/ A regex should be as explicit and exclusive as possible, so I would remove the lowercase (a-z) portion of the character class if you know for sure that the letters you want will always be uppercase. -Brian _ Brian H. Oak CISSP CISA Acorn Networks Security http://acornnetsec.com/ Hello, I am looking for help on a regex that examines strings such as xxxN yyy sssNNN xxxN yyyNyyy sss xxxN yyyNyyy ssN and returns only the sss part? N is always a numeral, and s is always alphabetic. Here is what I have so far as an example. I believe there is an eloquent way to do this in a single regex. my ( $string, $prefix ); $string = MBH1 WELL PIT050; ($prefix) = $string =~ # I want $prefix to equal PIT Thank you. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex Needed
Try this: $string =~ /^.{3}\d\s[^\s]+\s([a-zA-Z]+)\d+$/; $prefix = $1; That should match: - any three characters at the beginning of the string: ^.{3} - followed by a number: \d - followed by whitespace: \s - followed by any one or more characters until the next whitespace [^\s]+ - followed by whitespace: \s - grab all the following characters that are letters: ([a-zA-Z]+) - followed by 1 or more numbers until the end of the string: \d+$. Is that an accurate description? -dZ. - Original Message - From: Paul Rousseau Sent: 3/24/2006 1:38:07 PM To: Perl-Win32-Users@listserv.ActiveState.com Subject: Regex Needed Hello, I am looking for help on a regex that examines strings such as xxxN yyy sssNNN xxxN yyyNyyy sss xxxN yyyNyyy ssN and returns only the sss part? N is always a numeral, and s is always alphabetic. Here is what I have so far as an example. I believe there is an eloquent way to do this in a single regex. my ( $string, $prefix ); $string = MBH1 WELL PIT050; ($prefix) = $string =~ # I want $prefix to equal PIT Thank you. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex Needed
Something like this : /(\w\s){2}([a-zA-Z]+)\d*/ David Joe Discenza wrote: Paul Rousseau wrote, on Friday, March 24, 2006 12:38 PM :I am looking for help on a regex that examines strings such as : : xxxN yyy sssNNN : xxxN yyyNyyy sss : xxxN yyyNyyy ssN : : and returns only the sss part? N is always a numeral, and s : is always alphabetic. Does /.*(\d+)/ do what you want? Or is there more to the string after what you've shown? Good luck, Joe ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex Needed
At 10:38 AM 3/24/2006 -0700, Paul Rousseau wrote: I am looking for help on a regex that examines strings such as xxxN yyy sssNNN xxxN yyyNyyy sss xxxN yyyNyyy ssN and returns only the sss part? N is always a numeral, and s is always alphabetic. Do u have to examine those as fixed strings or as variable strings? Meaning do u know ahead of time which format ur looking at. If u don't know the format ahead of time then u should use the regex. But if u do know the format ahead of time (like it never changes for one application) then u shouldn't use a regex. Using substr will be faster. xxxN yyy sssNNN $s = substr $string, 13, 3; xxxN yyyNyyy sss $s = substr $string, 13, 3; xxxN yyyNyyy ssN $s = substr $string, 13, 6; #don't know what format $string will be $s = $string =~ m/\S+ \S+ ([a-z])+/i; -- REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =-- ...ne cede malis 0100 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Compiled regex error?
$Bill Luebkert a écrit : Maurice Height wrote: I have just solved a bug in my code involving a compiled regex. I am wondering if I have got it wrong or if this is a Perl error. To explain... I had a class ABC in which I passed a value to the constructor (eg: $arg{delim_str} ) and stored this for later use throughout the class: $self-{DELIM_RE} = qr/\Q$arg{delim_str}\E/os; Now when I create 2 class objects, each with a different value of $arg{delim_str}, the first instance works correctly, but the second seems to be using the same value that was created in the first instance. To my understanding, that's what it's supposed to do. The /o says you don't have to re-interpolate the contents of $arg{delim_str} after the first time. So just remove the /o and you should be fine. For example: my $abc1 = ABC-new( delim_str = q{|} ); # do some stuff with $abc1 my $abc2 = ABC-new( delim_str = q{,} ); # do some stuff with $abc2 *** does not work because the value of $self-{DELIM_RE} *** used in $abc2 is the same as that in $abc1 However if I remove the 'o' option from the regex, everything is OK. $self-{DELIM_RE} = qr/\Q$arg{delim_str}\E/s; I had assumed that even though a regex is compiled once only with the 'o' option, that instances of class data would be INDEPENDENT of each other. There are two solutions to get a regex not to recompile itself for every use : either you put a /o and then the regex is compiled only one time in a run (as it was correctly the case in your script), or you use qr/.../ to get a compiled regex in a scalar and then you match against it : then the regex will be compiled only when you evaluate the qr/.../ statement. You should not mix the two solutions as you did. -- Jedaï ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Compiled regex error?
I have just solved a bug in my code involving a compiled regex. I am wondering if I have got it wrong or if this is a Perl error. To explain... I had a class ABC in which I passed a value to the constructor (eg: $arg{delim_str} ) and stored this for later use throughout the class: $self-{DELIM_RE} = qr/\Q$arg{delim_str}\E/os; Now when I create 2 class objects, each with a different value of $arg{delim_str}, the first instance works correctly, but the second seems to be using the same value that was created in the first instance. For example: my $abc1 = ABC-new( delim_str = q{|} ); # do some stuff with $abc1 my $abc2 = ABC-new( delim_str = q{,} ); # do some stuff with $abc2 *** does not work because the value of $self-{DELIM_RE} *** used in $abc2 is the same as that in $abc1 However if I remove the 'o' option from the regex, everything is OK. $self-{DELIM_RE} = qr/\Q$arg{delim_str}\E/s; I had assumed that even though a regex is compiled once only with the 'o' option, that instances of class data would be INDEPENDENT of each other. Any comments welcome... Maurice ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Compiled regex error?
Maurice Height wrote: I have just solved a bug in my code involving a compiled regex. I am wondering if I have got it wrong or if this is a Perl error. To explain... I had a class ABC in which I passed a value to the constructor (eg: $arg{delim_str} ) and stored this for later use throughout the class: $self-{DELIM_RE} = qr/\Q$arg{delim_str}\E/os; Now when I create 2 class objects, each with a different value of $arg{delim_str}, the first instance works correctly, but the second seems to be using the same value that was created in the first instance. To my understanding, that's what it's supposed to do. The /o says you don't have to re-interpolate the contents of $arg{delim_str} after the first time. So just remove the /o and you should be fine. For example: my $abc1 = ABC-new( delim_str = q{|} ); # do some stuff with $abc1 my $abc2 = ABC-new( delim_str = q{,} ); # do some stuff with $abc2 *** does not work because the value of $self-{DELIM_RE} *** used in $abc2 is the same as that in $abc1 However if I remove the 'o' option from the regex, everything is OK. $self-{DELIM_RE} = qr/\Q$arg{delim_str}\E/s; I had assumed that even though a regex is compiled once only with the 'o' option, that instances of class data would be INDEPENDENT of each other. Apparently not a good assumption. perlretut man page: Part 1: The basics ... Using regular expressions in Perl ... There are a few more things you might want to know about matching operators. First, we pointed out earlier that variables in regexps are substituted before the regexp is evaluated: $pattern = 'Seuss'; while () { print if /$pattern/; } This will print any lines containing the word Seuss. It is not as efficient as it could be, however, because perl has to re-evaluate $pattern each time through the loop. If $pattern won't be changing over the lifetime of the script, we can add the //o modifier, which directs perl to only perform variable substitutions once: #!/usr/bin/perl #Improved simple_grep $regexp = shift; while () { print if /$regexp/o; # a good deal faster } ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Yet another regex question
I'd like to thank everybody who came up with suggestions. One thing I forgot to point out is that there are also people with whitespace in their *given* names, which seems to make things even more problematic I've updated my solution to accommodate that: while (DATA) { my @cols = m/^(\d+) #id1 \s(\(\d+\)) #id2 \s([\w ]+), #lastnames \s([^\d]+) #first name \s([\d\.]+) #data1 \s([\d\.]+) #data2 \s([\d\.]+) #data3 \s([\d\.]+) #data4 \s(\w+) #country code \s([\d\.]+) #data5 /x; printf %s\n, join \t, @cols; } __DATA__ 1 (1) DAVENPORT, LINDSAY 3380.00 16 .00 49.00 USA .00 2 (2) CLIJSTERS, KIM 3206.00 17 .00 .00 BEL .00 28 (28) MOLIK, ALICIA 671.00 15 .00 195.00 AUS .00 29 (33) MEDINA GARRIGUES, ANABEL 660.75 27 30.00 10.00 ESP 2.00 30 (35) KOUKALOVA, KLARA 660.75 23 16.00 20.00 CZE 2.00 77 (84) MONTOYA, INIGO CONQUISTADOR 100.22 23 16.00 20.00 ESP 2.00 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Yet another regex question
I'm fairly good at using regexes to find things, but using them to *replace* things is something I find quite difficult. I have a text file with lines like this: snip 1 (1) DAVENPORT, LINDSAY 3380.00 16 .00 49.00 USA .00 2 (2) CLIJSTERS, KIM 3206.00 17 .00 .00 BEL .00 [...] 28 (28) MOLIK, ALICIA 671.00 15 .00 195.00 AUS .00 29 (33) MEDINA GARRIGUES, ANABEL 660.75 27 30.00 10.00 ESP 2.00 30 (35) KOUKALOVA, KLARA 660.75 23 16.00 20.00 CZE 2.00 /snip that I want to turn into a tab-delimited file. Unfortunately, I can't simply turn all spaces into tabs: note that there are people with two-word surnames. However, I'm having difficulty coming up with any ideas on how to find information on either side of a space, and changing that into something plus a tab. I tried the following to deal with the country codes: while (FILEFROM) { chomp; if ($_ =~/\d\s[A-Z]{3}\s/) { $_ = s/$1/$1\t/g; } print FILETO $_\n; } But all I got was a file with a bunch of lines of 1's. I tried escaping the $'s, figuring it wouldn't help, and it didn't: I ended up with an empty file. I also tried putting () around each of the $'s, and that gave me an even odder file, with each line containing a two-digit number, with no relationship I can spot between the numbers and the country codes: USA produces 54, 51, and 52 the first three times it's matched, while RUS produces 50, 54, 54, and 56 the first four times it's matched. I've tried reading perlre, and it's given me no help. I don't know where to begin. -- Ted Schuerzinger, [EMAIL PROTECTED] ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Yet another regex question
If you want tab instead of space after each country code, try this: while (FILEFROM) { if (/\d\s[A-Z]{3}\s/) { s/(\d\s[A-Z]{3})\s/$1\t/g; } print FILETO $_; } I'm fairly good at using regexes to find things, but using them to *replace* things is something I find quite difficult. I have a text file with lines like this: snip 1 (1) DAVENPORT, LINDSAY 3380.00 16 .00 49.00 USA .00 2 (2) CLIJSTERS, KIM 3206.00 17 .00 .00 BEL .00 [...] 28 (28) MOLIK, ALICIA 671.00 15 .00 195.00 AUS .00 29 (33) MEDINA GARRIGUES, ANABEL 660.75 27 30.00 10.00 ESP 2.00 30 (35) KOUKALOVA, KLARA 660.75 23 16.00 20.00 CZE 2.00 /snip that I want to turn into a tab-delimited file. Unfortunately, I can't simply turn all spaces into tabs: note that there are people with two-word surnames. However, I'm having difficulty coming up with any ideas on how to find information on either side of a space, and changing that into something plus a tab. I tried the following to deal with the country codes: while (FILEFROM) { chomp; if ($_ =~/\d\s[A-Z]{3}\s/) { $_ = s/$1/$1\t/g; } print FILETO $_\n; } But all I got was a file with a bunch of lines of 1's. I tried escaping the $'s, figuring it wouldn't help, and it didn't: I ended up with an empty file. I also tried putting () around each of the $'s, and that gave me an even odder file, with each line containing a two-digit number, with no relationship I can spot between the numbers and the country codes: USA produces 54, 51, and 52 the first three times it's matched, while RUS produces 50, 54, 54, and 56 the first four times it's matched. I've tried reading perlre, and it's given me no help. I don't know where to begin. -- Ted Schuerzinger, [EMAIL PROTECTED] ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
AW: Yet another regex question
Probably not the best but a working solution is to split the string using split, find out wether there are to much fields, consider they are two-or-more-word names, join the corresponding name-fields with a space and the overall with tabs. Dietmar --- snip --- I'm fairly good at using regexes to find things, but using them to *replace* things is something I find quite difficult. I have a text file with lines like this: snip 1 (1) DAVENPORT, LINDSAY 3380.00 16 .00 49.00 USA .00 2 (2) CLIJSTERS, KIM 3206.00 17 .00 .00 BEL .00 [...] 28 (28) MOLIK, ALICIA 671.00 15 .00 195.00 AUS .00 29 (33) MEDINA GARRIGUES, ANABEL 660.75 27 30.00 10.00 ESP 2.00 30 (35) KOUKALOVA, KLARA 660.75 23 16.00 20.00 CZE 2.00 /snip that I want to turn into a tab-delimited file. Unfortunately, I can't simply turn all spaces into tabs: note that there are people with two-word surnames. However, I'm having difficulty coming up with any ideas on how to find information on either side of a space, and changing that into something plus a tab. I tried the following to deal with the country codes: while (FILEFROM) { chomp; if ($_ =~/\d\s[A-Z]{3}\s/) { $_ = s/$1/$1\t/g; } print FILETO $_\n; } But all I got was a file with a bunch of lines of 1's. I tried escaping the $'s, figuring it wouldn't help, and it didn't: I ended up with an empty file. I also tried putting () around each of the $'s, and that gave me an even odder file, with each line containing a two-digit number, with no relationship I can spot between the numbers and the country codes: USA produces 54, 51, and 52 the first three times it's matched, while RUS produces 50, 54, 54, and 56 the first four times it's matched. I've tried reading perlre, and it's given me no help. I don't know where to begin. -- Ted Schuerzinger, [EMAIL PROTECTED] ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Yet another regex question
Title: Yet another regex question Ted Schuerzinger wrote, on Thu 12-Jan-06 08:45: I have a text filewith lines like this::: 1 (1) DAVENPORT, LINDSAY 3380.00 16 .00 49.00 USA .00: 2 (2) CLIJSTERS, KIM 3206.00 17 .00 .00 BEL .00: [...]: 28 (28) MOLIK, ALICIA 671.00 15 .00 195.00 AUS .00: 29 (33) MEDINA GARRIGUES, ANABEL 660.75 27 30.00 10.00 ESP 2.00: 30 (35) KOUKALOVA, KLARA 660.75 23 16.00 20.00 CZE 2.00: that I want to turn into a tab-delimited file.Unfortunately, I can'tsimply :turn all spaces into tabs: note that there arepeople with two-wordsurnames. Part of the problem with this code : if ($_ =~/\d\s[A-Z]{3}\s/) {: $_ = s/$1/$1\t/g;: } is you have no capturing parentheses to populate $1. Toss this code. You seem to have a pretty good picture of your data; why not turn that into a regex completely, instead of doing it piecemeal? /(\d+)\s+\((\d+)\)\s+([A-Z\s]+),\s+([A-Z]+)\s+(\S+)\s+(\S+)\s+(\S+)\s+([A-Z]{3})\s+(\S+)/ and have a replace section that strings together all your captures with tabs between: s/.../$1\t$2\t$3\t$4\t$5\t$6\t$7\t$8\t$9\t${10}/ You don't need the parentheses around field 2, or the comma after the last name, do you? If so, you can put those inside the captures. Good luck, Joe == Joseph P. Discenza, Sr. Programmer/Analyst mailto:[EMAIL PROTECTED] Carleton Inc. http://www.carletoninc.com 574.243.6040 ext. 300 fax: 574.243.6060Providing Financial Solutions and Compliance for over 30 Years ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Yet another regex question
At 08:45 AM 1/12/2006 -0500, [EMAIL PROTECTED] wrote: 1 (1) DAVENPORT, LINDSAY 3380.00 16 .00 49.00 USA .00 2 (2) CLIJSTERS, KIM 3206.00 17 .00 .00 BEL .00 [...] 28 (28) MOLIK, ALICIA 671.00 15 .00 195.00 AUS .00 29 (33) MEDINA GARRIGUES, ANABEL 660.75 27 30.00 10.00 ESP 2.00 30 (35) KOUKALOVA, KLARA 660.75 23 16.00 20.00 CZE 2.00 print join \t, $line =~ m/^(\d+) \((\d+)\) ([a-zA-Z ]+), (\w+) ([^ ]+) ([^ ]+) ([^ ]+) ([^ ]+) ([^ ]+) ([^ ]+)$/; -- REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =-- ...ne cede malis 0100 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Yet another regex question
I'm fairly good at using regexes to find things, but using them to *replace* things is something I find quite difficult. I have a text file with lines like this: snip 1 (1) DAVENPORT, LINDSAY 3380.00 16 .00 49.00 USA .00 2 (2) CLIJSTERS, KIM 3206.00 17 .00 .00 BEL .00 [...] 28 (28) MOLIK, ALICIA 671.00 15 .00 195.00 AUS .00 29 (33) MEDINA GARRIGUES, ANABEL 660.75 27 30.00 10.00 ESP 2.00 30 (35) KOUKALOVA, KLARA 660.75 23 16.00 20.00 CZE 2.00 /snip that I want to turn into a tab-delimited file. Well, you haven't let us know what the output is supposed to look like, but try this for starters: while (DATA) { my @cols = m/^(\d+) #id1 \s(\(\d+\)) #id2 \s([\w ]+), #lastnames \s(\w+) #first name \s([\d\.]+) #data1 \s([\d\.]+) #data2 \s([\d\.]+) #data3 \s([\d\.]+) #data4 \s(\w+) #country code \s([\d\.]+) #data5 /x; printf %s\n, join \t, @cols; } __DATA__ 1 (1) DAVENPORT, LINDSAY 3380.00 16 .00 49.00 USA .00 2 (2) CLIJSTERS, KIM 3206.00 17 .00 .00 BEL .00 28 (28) MOLIK, ALICIA 671.00 15 .00 195.00 AUS .00 29 (33) MEDINA GARRIGUES, ANABEL 660.75 27 30.00 10.00 ESP 2.00 30 (35) KOUKALOVA, KLARA 660.75 23 16.00 20.00 CZE 2.00 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Yet another regex question
At 10:19 AM 1/12/2006, =?koi8-r?Q?=E1=D2=D4=C5=CD=20=E1=D7=C5=D4=C9=D3=D1=CE?= wrote: If you want tab instead of space after each country code, try this: while (FILEFROM) { if (/\d\s[A-Z]{3}\s/) { s/(\d\s[A-Z]{3})\s/$1\t/g; } print FILETO $_; } I don't see the point of the if statement. Why not just do the s/.../g as the only statement in the while loop. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Yet another regex question
Ted wrote: while (FILEFROM) { chomp; if ($_ =~/\d\s[A-Z]{3}\s/) { $_ = s/$1/$1\t/g; } print FILETO $_\n; } You were close Ted but there are a couple problems. 1. $_ = s/$1/$1\t/g; should be $_ =~ s/$1/$1\t/g; (you left out the ~) 2. $1 isn't defined anywhere in this code since there are no paren's in the first REGEX. Try this (similar to what Artemave posted); while (FILEFROM) { chomp; $_ =~ s/(\d\s[A-Z]{3}\s)/$1\t/; print FILETO $_; } ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re[2]: Yet another regex question
True. Wanted to be the first replier ;) Artem A. Avetisyan At 10:19 AM 1/12/2006, =?koi8-r?Q?=E1=D2=D4=C5=CD=20=E1=D7=C5=D4=C9=D3=D1=CE?= wrote: If you want tab instead of space after each country code, try this: while (FILEFROM) { if (/\d\s[A-Z]{3}\s/) { s/(\d\s[A-Z]{3})\s/$1\t/g; } print FILETO $_; } I don't see the point of the if statement. Why not just do the s/.../g as the only statement in the while loop. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Yet another regex question
Joe Discenza [EMAIL PROTECTED] graced perl with these words of wisdom: You seem to have a pretty good picture of your data; why not turn that into a regex completely, instead of doing it piecemeal? /(\d+)\s+\((\d+)\)\s+([A-Z\s]+),\s+([A-Z]+)\s+(\S+)\s+(\S+)\s+(\S+)\s+ ([A-Z]{3})\s+(\S+)/ and have a replace section that strings together all your captures with tabs between: s/.../$1\t$2\t$3\t$4\t$5\t$6\t$7\t$8\t$9\t${10}/ I'd like to thank everybody who came up with suggestions. One thing I forgot to point out is that there are also people with whitespace in their *given* names, which seems to make things even more problematic (backtracking and all that). Somebody off-list gave me the suggestion not of using a regex, but of splitting each line on the \s characters, and then manipulating arrays with pop and shift and reverse. That's something that I'm decidedly more able to handle, although I have a question -- when I used this bit of code: snip $transfer[2] = scalar @line; # @line only has the names left for $x (0 .. 8) { print FILETO $transfer[$x]\t; } print FILETO \n } The results were as follows: snip 1 (1) 2 3380.00 16 .00 49.00 USA .00 2 (2) 2 3206.00 17 .00 .00 BEL .00 3 (3) 2 2851.00 19 .00 .00 FRA 1.00 [...] /snip I had to amend the code to add an if/else clause for the $x=2 case: $transfer[2] = scalar @line; for $x (0 .. 8) { if ($x == 2) { print FILETO @line\t; } else { print FILETO $transfer[$x]\t; } } print FILETO \n } In order to get it to work. Any ideas why? -- Ted fedya at bestweb dot net Oh Marge, anyone can miss Canada, all tucked away down there --Homer Simpson ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Yet another regex question
At 04:43 PM 1/12/2006 -0500, Ted Schuerzinger wrote: $transfer[2] = scalar @line; # @line only has the names left for $x (0 .. 8) { print FILETO $transfer[$x]\t; } print FILETO \n } If @lines contains the name components then u need to do join , @lines to get the contents out. scalar @lines returns the number of elements. -- REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =-- ...ne cede malis 0100 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex Newbie Q: Non-Trivial Substitution and Modifying the Matched String
Veli-Pekka You're implying that the music macro language is pos() sensitive. That is a pretty severe problem in itself. Can you output the macro language in a format that is not position sensitive - do a global change - then process the macro language back into original format? I think I'd be more inclined to use a $noteindex{$note} = $pitch hash table than a regex - then process the input one note at a time. KenMc On Oct 9, 2005, at 2:58 PM, Veli-Pekka Tätilä wrote: Hi, Yet another newbie question about regular expressions: I'd like to find and replace bits of text as usual. However, rather than replace all occurrences in one quick swoop using the s- operator and the g-flag, the replacement is so complex that it cannot be expressed as a straight substitution. So I would have to find a piece of text, process it in a separate function, and replace the matched text with the newly computed text. This goes on for n interesting matches in the input. Can I do this kind of thing in a simple loop, processing all matches one by one? My understanding is that pos and some special variables will tel me the character index of the mach in a string. But if I then go and modify the string using substr to do the substitution, wil it reset the search position to the beginning when trying to match the next interesting bit? The replacement text is by nature longer than the original so the input string needs to grow on each substitution which might present a problem to the matching operator. I took a look at perlop and some books I have on Perl but didn't end up with a definitive answer of how I should solve this problem. That was the problem abstractly put, here's the specific instance: I'm writing a program to convert notes given in the music macro language to their equivalent pitches that are applied using markup tags for speech synthesizers. I can match a note easily, and have functions for computing the pitch and the tag in question. As the note values don't depend on each other in any way, I'd like to completely process one note at a time, doing the replacement, and then continue matching the next note where it previously left off. Naturally there are other tokens than just notes in the input so I need to maintain the position of the note data in the input string. If I modify a separate copy of the string, it will throw off the pos () indeces because the substitutions will change the length of the copy. Any help appreciated as usual. PS: If the problem statement is still a bit fuzzy or incomplete, just ask and I'll try to provide more info. -- With kind regards Veli-Pekka Tätilä ([EMAIL PROTECTED]) Accessibility, game music, synthesizers and programming: http://www.student.oulu.fi/~vtatila/ ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Regex Newbie Q: Non-Trivial Substitution and Modifying the MatchedString
Title: Regex Newbie Q: Non-Trivial Substitution and Modifying the MatchedString "Veli-Pekka Tätilä" wrote, on Sun 10/9/2005 15:58: Yet another newbie question about regular expressions:: I'd like to find and replace bits of text as usual. However, rather than: replace all occurrences in one quick swoop using the s-operator and the: g-flag, the replacement is so complex that it cannot be expressed as a: straight substitution. So I would have to find a piece of text, process it: in a separate function, and replace the matched text with the newly computed: text. This goes on for n interesting matches in the input. I can't tell from your note if you've investigated the /e flag yet, that allows you to replace a chunk of text with the result of a function call: s/(stuff that's not a note)?(note)/$1tag_from_note($2)/ge; Good luck, Joe == Joseph P. Discenza, Sr. Programmer/Analyst mailto:[EMAIL PROTECTED] Carleton Inc. http://www.carletoninc.com 574.243.6040 ext. 300 fax: 574.243.6060Providing Financial Solutions and Compliance for over 30 Years ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex Newbie Q: Non-Trivial Substitution and Modifying the Matched String
Kenneth McNamara wrote: You're implying that the music macro language is pos() sensitive. That is a pretty severe problem in itself. Hmm, I'm not totally sure if it is. But true certain modifiers apply until the next note, such as one or more periods. My MML experience is actually from trying to write some tunes for the 8-bit Nintendo to see how that's like compared to ordinary MIDI and audio sequencing. Unfortunately, most of the MML docs I've seen are in Japanese and I only have some English tutorials related to the Nintendo. This is not that big a hurdle in this project, though. I only took the inspiration for ASCII note data from MML, it's not going to be a fully-fledged MML parser or anything. I'm hehlping out a fellow musician to do singing synthesis using ordinary speech synths not originally ment foor that purpose. Tiny Perl using WIn32::FileOp and Win32::GuiTest seems to be great for this purpose. As a screen reader user I know nearly all Windows keyboard hotkeys by heart, so programmatically interacting with most GUI controls is a breeze, I think I'd be more inclined to use a $noteindex{$note} = $pitch hash table I'm using one hash that maps from note names including sharps to their pitches that are pre-computed at program startup. Then I just need to multiply the pitch to do an octave shift. If I were to parse the whole of the MML language I bet even regexp could not be used to simply match the whole thing. I'm currently going through a tutorial on recursive, descent parsers and writing the stuff in, you guessed it Perl, rather than Pascal. But this is getting OT, at least as far as my original problem goes, which was cleanly solved by at least two people already, nice. -- With kind regards Veli-Pekka Tätilä ([EMAIL PROTECTED]) Accessibility, game music, synthesizers and programming: http://www.student.oulu.fi/~vtatila/ ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex Newbie Q: Non-Trivial Substitution and Modifying the MatchedString
Joe Discenza wrote: Veli-Pekka Tätilä wrote, on Sun 10/9/2005 15:58 replacement is so complex that it cannot be expressed as a straight substitution. So I would have to find a piece of text, process it in a separate function, and replace the matched text with the newly computed text. This goes on for n interesting matches in the input. I can't tell from your note if you've investigated the /e flag yet, that allows you to replace a chunk of text with the result of a function call: s/(stuff that's not a note)?(note)/$1tag_from_note($2)/ge; Hi Joe, The e-flag was just what I was looking for, thanks. Being Perl, I guessed there would be some easy way of achieving the desired effect. The perlop page is not that hierarchically structured so I missed the e-flag there. Partly because there are only a couple of lines about it but that's what you get in a reference manual, grin. -- With kind regards Veli-Pekka Tätilä ([EMAIL PROTECTED]) Accessibility, game music, synthesizers and programming: http://www.student.oulu.fi/~vtatila/ ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex Newbie Q: Non-Trivial Substitution and Modifying the Matched String
Not sure I'm getting it completely, but using match in a while loop w/ the /g modifier lets you process a string one match at a time: my $string = Lots of words to be read one at a time.\nthough more than one line; while ( $string =~ /(\w+)/g ) { print Found: $1\n; print Proceessing ...\n; } # while /(\w+)/g While the original string should be left alone (so pos and \G (the marker for the last match) don't get confused) you can process/chop up a copy of the string as you go. To get really tricky, you can munge the string and matches via assignments to pos() - best to look at Mastering Regular Expressions (O'Reilly/Freidl - or better, buy it!) Chapter 7ish. That's not for the fainthearted. @nums = $data =~ m/\d+/g; # pick apart string, returning a list of numbers Suppose the list of numbers starts *after* a marker - xx - prime the 'pos' $data =~ m/xx/g; # prime the /g start, pos($data) now point to just after the xx @nums = $data =~ m/\d+/g; # pick apart the rest of the string, returning a list of numbers Or: pos($data) = $i if $i = index($data, xx), $i 0; # find the xx @nums = $data =~ m/\d+/g; # pick apart the rest of the string, returning a list of numbers difference here is pos is just before the xx while in the previous, its just after but ... a Andy Bach, Sys. Mangler Internet: [EMAIL PROTECTED] VOICE: (608) 261-5738 FAX 264-5932 History will have to record that the greatest tragedy of this period of social transition was not the strident clamor of the bad people, but the appalling silence of the good people. Martin Luther King, Jr. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Regex Newbie Q: Non-Trivial Substitution and Modifying the Matched String
Hi, Yet another newbie question about regular expressions: I'd like to find and replace bits of text as usual. However, rather than replace all occurrences in one quick swoop using the s-operator and the g-flag, the replacement is so complex that it cannot be expressed as a straight substitution. So I would have to find a piece of text, process it in a separate function, and replace the matched text with the newly computed text. This goes on for n interesting matches in the input. Can I do this kind of thing in a simple loop, processing all matches one by one? My understanding is that pos and some special variables will tel me the character index of the mach in a string. But if I then go and modify the string using substr to do the substitution, wil it reset the search position to the beginning when trying to match the next interesting bit? The replacement text is by nature longer than the original so the input string needs to grow on each substitution which might present a problem to the matching operator. I took a look at perlop and some books I have on Perl but didn't end up with a definitive answer of how I should solve this problem. That was the problem abstractly put, here's the specific instance: I'm writing a program to convert notes given in the music macro language to their equivalent pitches that are applied using markup tags for speech synthesizers. I can match a note easily, and have functions for computing the pitch and the tag in question. As the note values don't depend on each other in any way, I'd like to completely process one note at a time, doing the replacement, and then continue matching the next note where it previously left off. Naturally there are other tokens than just notes in the input so I need to maintain the position of the note data in the input string. If I modify a separate copy of the string, it will throw off the pos() indeces because the substitutions will change the length of the copy. Any help appreciated as usual. PS: If the problem statement is still a bit fuzzy or incomplete, just ask and I'll try to provide more info. -- With kind regards Veli-Pekka Tätilä ([EMAIL PROTECTED]) Accessibility, game music, synthesizers and programming: http://www.student.oulu.fi/~vtatila/ ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex Newbie Q: Non-Trivial Substitution and Modifying the MatchedString
So what ur saying is that u want to do a lot of substitutions in one pass. U could always have one s///g for each thing u want substituted and run it n times. Can u post some actual strings that u want to parse and the substitutions? Then we can figure it out. At 10:58 PM 10/9/05 +0300, =?iso-8859-1?Q?Veli-Pekka_T=E4til=E4?= wrote: That was the problem abstractly put, here's the specific instance: I'm writing a program to convert notes given in the music macro language to their equivalent pitches that are applied using markup tags for speech synthesizers. I can match a note easily, and have functions for computing the pitch and the tag in question. As the note values don't depend on each other in any way, I'd like to completely process one note at a time, doing the replacement, and then continue matching the next note where it previously left off. Naturally there are other tokens than just notes in the input so I need to maintain the position of the note data in the input string. If I modify a separate copy of the string, it will throw off the pos() indeces because the substitutions will change the length of the copy. -- REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =-- ...ne cede malis 0100 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex
Chris Wagner wrote: At 05:11 PM 9/27/05 -0700, $Bill Luebkert wrote: \s* means to grab any WS at the current position (including the case where there is none). \s*? means 0 or 1 of the above which is totally meaningless - you've already eaten all the WS with the \s*, so in my opinion the ? is redundant to what you have already done. I retract the above \s*? stmt. \s*? won't grab any WS in this case because (\s*?) is not the same as (\s*)? (which is what I was thinking). Redundant vs. Useless! Semantic battle of the century!! Who's right and who's wrong: and will be put to DEATH! Redundant: m/\s*\s/; # Specifying something again when it was already specified Useless: m/xyz\s*?$/;# Specifying something that does nothing * maximal match, eat up as many characters as possible to make the overall expression match *? minimal match, eat up as few characters as possible to make the overall expression match So in my revised opinion, it's not redundant, but it's not useless either - it's plain wrong. I'm sure the intent here is to eat as much WS as there is and that's not what \s*? will do for you. \s*? won't eat any WS. That rule doesn't disappear just because a certain character sequence was specified. *? is only *useful* when used with wildcards since it will decay to a nul if used with a fixed string. The minimum of the range 0 to inf is 0. -- ,-/- __ _ _ $Bill LuebkertMailto:[EMAIL PROTECTED] (_/ / )// // DBE CollectiblesMailto:[EMAIL PROTECTED] / ) /-- o // // Castle of Medieval Myth Magic http://www.todbe.com/ -/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff) ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex
David Budd wrote: I thought this was working, but my logs just showed a case where it seems not to do what I want. Why does: $OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ; Not become true when $body contains: Library Card: 0240742 i'd bet that it passes when you have some whitespace after the number and fails when it doesn't. thats a common problem and can be hard to trace. in any event, your ending \D *requires* a non-digit character to immediately follow the number. since your example failed, it must be terminating the line. adding a ? or * to the \D will not behave as (I believe) you intend, i.e. it will not filter out numbers with 8 or more digits so IF your string always terminates the line (with or without whitespace), this will require exactly seven digits and not care whether or not whitespace follows: $OK_body = ($body =~ /library\s*card\s*:?\s*(\d{7})\s*$/i) However... if the library card: # string does not always terminate the line and/or if you need to allow the possibility of non-digit characters immediately after the 7-digit number (example, library card: 1234567MORE_STUFF), then you will need to use a word boundary: $OK_body = ($body =~ /library\s*card\s*:?\s*(\d{7})\D*\b/i) and by the way, *? is redundant. * means zero or more. ? means zero or one. --rob ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex
At 12:08 AM 9/27/05 -0700, robert johnson wrote: and by the way, *? is redundant. * means zero or more. ? means zero or one. Actually the *? construct is not a redundancy. It calls for a minimal match rather than a maximal match, which is the default. Although it was useless in the example. ;) -- REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =-- ...ne cede malis 0100 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex
Chris Wagner wrote: At 12:08 AM 9/27/05 -0700, robert johnson wrote: and by the way, *? is redundant. * means zero or more. ? means zero or one. Actually the *? construct is not a redundancy. It calls for a minimal match rather than a maximal match, which is the default. Although it was useless in the example. ;) Maybe wrong would be a better term for you ? \s* means to grab any WS at the current position (including the case where there is none). \s*? means 0 or 1 of the above which is totally meaningless - you've already eaten all the WS with the \s*, so in my opinion the ? is redundant to what you have already done. -- ,-/- __ _ _ $Bill LuebkertMailto:[EMAIL PROTECTED] (_/ / )// // DBE CollectiblesMailto:[EMAIL PROTECTED] / ) /-- o // // Castle of Medieval Myth Magic http://www.todbe.com/ -/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff) ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex
John wrote: David Budd wrote: I thought this was working, but my logs just showed a case where it seems not to do what I want. Why does: $OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ; Not become true when $body contains: Library Card: 0240742 Just possibly there's some dodgy html or something in the original that doen't make it through to my logs, but right now I'm perplexed Having looked at other replies to this, isn't it irrelevant what comes after the (\d{7}) part of the re. Depends - I'd put a \b after it if you want to make sure there are no more characters in the field/number. -- ,-/- __ _ _ $Bill LuebkertMailto:[EMAIL PROTECTED] (_/ / )// // DBE CollectiblesMailto:[EMAIL PROTECTED] / ) /-- o // // Castle of Medieval Myth Magic http://www.todbe.com/ -/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff) ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex
At 02:42 PM 9/25/05 +1000, [EMAIL PROTECTED] wrote: Why does: $OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ; Not become true when $body contains: Library Card: 0240742 Having looked at other replies to this, isn't it irrelevant what comes after the (\d{7}) part of the re. It's not true because the final \D requires that something be present *after* 7 digits are found. -- REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =-- ...ne cede malis 0100 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex
David Budd wrote: I thought this was working, but my logs just showed a case where it seems not to do what I want. Why does: $OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ; Not become true when $body contains: Library Card: 0240742 Just possibly there's some dodgy html or something in the original that doen't make it through to my logs, but right now I'm perplexed Having looked at other replies to this, isn't it irrelevant what comes after the (\d{7}) part of the re. -- Regards John McMahon ([EMAIL PROTECTED]) -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.344 / Virus Database: 267.11.6/111 - Release Date: 23/09/2005 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Regex
You state that there must be a NON numeric at end of line. I would have \D* or \D*$. Excellent suggestion. I shall implement it forthwith. Part of the reason I was perplexed was that this script ran for a year with nobody complaining. In fact, I discovered some time after I'd posted that it's not the regex that's failing, it's some execrable code doing the logging - I hadn't cleared a variable at the beginning of a loop, so the log messages sometimes referred to the previous iteration. I guess we were just lucky over the last year that on the odd occasions we had to sort problems by checking the logs, it didn't affect anything. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Regex
At 09:07 AM 9/21/05 +0100, [EMAIL PROTECTED] wrote: You state that there must be a NON numeric at end of line. I would have \D* or \D*$. Excellent suggestion. I shall implement it forthwith. Actually, to make sure ur string ends with a non-numeric u need \D$ not \D*$. \D* matches 0 or more non-digits. That is used for cases where a string could end in \D but doesn't *have* to. C:\WINDOWS\Desktopperl $bob = abc123; print non-number end\n if $bob =~ m/.+\D*$/; *** non-number end -- REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =-- ...ne cede malis 0100 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Regex
I thought this was working, but my logs just showed a case where it seems not to do what I want. Why does: $OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ; Not become true when $body contains: Library Card: 0240742 Just possibly there's some dodgy html or something in the original that doen't make it through to my logs, but right now I'm perplexed -- David Budd, Applications section, IT Services Kilburn Building, University of Manchester Tel 56033 Email [EMAIL PROTECTED] http://www.its.man.ac.uk/applications ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Regex
Title: Regex David Budd wrote, on Tue 9/20/2005 10:57: : I thought this was working, but my logs just showed a case where it seems not to do what I want.: Why does:: $OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ;: Not become true when $body contains:: Library Card: 0240742 Probably because your regex explicitly requires a non-digit (\D) at the end, and your example line doesn't have it. Perhaps you want \D*$ so it finds no more than 7 digits, or if more digits are allowed as long as a non-digit intervenes, you might want (\D|$). Good luck, Joe == Joseph P. Discenza, Sr. Programmer/Analyst mailto:[EMAIL PROTECTED] Carleton Inc. http://www.carletoninc.com 574.243.6040 ext. 300 fax: 574.243.6060Providing Financial Solutions and Compliance for over 30 Years ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex
The string $body (Library Card: 0240740) does not end in \D (not a number). You might want to add a * to that if you want to make sure it matches strings that _do_ end in \D (ie. \n or \r\n or whatever stuff comes behind the ID) and those that end in the ID itself. - Original Message - From: David Budd [EMAIL PROTECTED] To: perl-win32-users@listserv.ActiveState.com Sent: Tuesday, September 20, 2005 4:57 PM Subject: Regex I thought this was working, but my logs just showed a case where it seems not to do what I want. Why does: $OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ; Not become true when $body contains: Library Card: 0240742 Just possibly there's some dodgy html or something in the original that doen't make it through to my logs, but right now I'm perplexed -- David Budd, Applications section, IT Services Kilburn Building, University of Manchester Tel 56033 Email [EMAIL PROTECTED] http://www.its.man.ac.uk/applications ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Regex
David Budd wrote: I thought this was working, but my logs just showed a case where it seems not to do what I want. Why does: $OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ; What's the last \D for ? '\s*?' should just be '\s*' - same with \D*?. Not become true when $body contains: Library Card: 0240742 Just possibly there's some dodgy html or something in the original that doen't make it through to my logs, but right now I'm perplexed use strict; use warnings; my $body = 'Library Card: 0240742'; my $OK_body = ($body =~ /library\s*card\D*(\d{7})/i); print OK_body = , $OK_body ? 'true' : 'false', \n; __END__ -- ,-/- __ _ _ $Bill LuebkertMailto:[EMAIL PROTECTED] (_/ / )// // DBE CollectiblesMailto:[EMAIL PROTECTED] / ) /-- o // // Castle of Medieval Myth Magic http://www.todbe.com/ -/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff) ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Regex
[EMAIL PROTECTED] wrote: I thought this was working, but my logs just showed a case where it seems not to do what I want. Why does: $OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ; Not become true when $body contains: Library Card: 0240742 Just possibly there's some dodgy html or something in the original that doen't make it through to my logs, but right now I'm perplexed You state that there must be a NON numeric at end of line. I would have \D* or \D*$. Wags ;) *** This message contains information that is confidential and proprietary to FedEx Freight or its affiliates. It is intended only for the recipient named and for the express purpose(s) described therein. Any other use is prohibited. *** ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Regex
David wrote: I thought this was working, but my logs just showed a case where it seems not to do what I want. Why does: $OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ; Not become true when $body contains: Library Card: 0240742 Just possibly there's some dodgy html or something in the original that doen't make it through to my logs, but right now I'm perplexed Is there a chance that 'Library Card: 0240742' comes at the end of the file? Or that you used chomp on the variable $body? The \D at the end of the regular expression would not work if there wasn't a nondigit at the end of the line. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: regex expression to determine if i have a valid email!!
bruce wrote: hi... i've got a php app, and i'm trying to figure out how/where to turn to to get a good working regex in order to determine if i have a valid email address any help/thoughts/etc.. would be seriously helpful... i've come across a great many preg_match functions for php, but i haven't run across one that works all the time!! i finally figured that someone here, might have the exact soln to my prob! Try these links for help: http://search.cpan.org/src/MAURICE/Email-Valid-0.15/Valid.pm http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz -- ,-/- __ _ _ $Bill LuebkertMailto:[EMAIL PROTECTED] (_/ / )// // DBE CollectiblesMailto:[EMAIL PROTECTED] / ) /-- o // // Castle of Medieval Myth Magic http://www.todbe.com/ -/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff) ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: regex expression to determine if i have a valid email!!
Chris Wagner wrote: Well it very much depends on what u consider a valid email address. Because technically, anything can be valid in some context. What u probably want here is a fully qualified Internet mail address. The basic form of this would be m/[EMAIL PROTECTED]/. The protocol expects a path like this to the mail server: path ::= [ a-d-l : ] mailbox The mailbox portion is what you would normally see at the app. See mailbox below: a-d-l ::= at-domain | at-domain , a-d-l at-domain ::= @ domain domain ::= element | element . domain element ::= name | # number | [ dotnum ] mailbox ::= local-part @ domain local-part ::= dot-string | quoted-string name ::= a ldh-str let-dig ldh-str ::= let-dig-hyp | let-dig-hyp ldh-str let-dig ::= a | d let-dig-hyp ::= a | d | - dot-string ::= string | string . dot-string string ::= char | char string quoted-string ::= qtext qtext ::= \ x | \ x qtext | q | q qtext char ::= c | \ x dotnum ::= snum . snum . snum . snum number ::= d | d number CRLF ::= CR LF CR ::= the carriage return character (ASCII code 13) LF ::= the line feed character (ASCII code 10) SP ::= the space character (ASCII code 32) snum ::= one, two, or three digits representing a decimal integer value in the range 0 through 255 a ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case c ::= any one of the 128 ASCII characters, but not any special or SP d ::= any one of the ten digits 0 through 9 q ::= any one of the 128 ASCII characters except CR, LF, quote (), or backslash (\) x ::= any one of the 128 ASCII characters (no exceptions) special ::= | | ( | ) | [ | ] | \ | . | , | ; | : | @ | the control characters (ASCII codes 0 through 31 inclusive and 127) If u want to limit that to known legitimate MX's u can do DNS lookups on the domain part. -- ,-/- __ _ _ $Bill LuebkertMailto:[EMAIL PROTECTED] (_/ / )// // DBE CollectiblesMailto:[EMAIL PROTECTED] / ) /-- o // // Castle of Medieval Myth Magic http://www.todbe.com/ -/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff) ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: regex expression to determine if i have a valid email!!
i've got a php app Say it isn't so! :) i finally figured that someone here, might have the exact soln to my prob! 1. Make sure it is formatted correctly, as per RFC 822. To do that, there is one big scary regex created by Jeffrey Friedl (author of Mastering Regular Expressions). If you check the link that $Bill gave you to Email::Valid, the regular expression is in there. 2. Make sure the domain is valid, and resolves to an IP address 3. Make sure an MX record is defined for the domain, and the mail host is accessible. 4. (Optional) Check the ip against selected blacklist(s). Of course, the easiest way to do this is Perl. Email::Valid does 1-3 for you. - Mark. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: regex expression to determine if i have a valid email!!
On 9/16/05, Chris Wagner [EMAIL PROTECTED] wrote: Well it very much depends on what u consider a valid email address.Because technically, anything can be valid in some context.What u probablywant here is a fully qualified Internet mail address.The basic form of this would be m/[EMAIL PROTECTED]/.If u want to limit that to knownlegitimate MX's u can do DNS lookups on the domain part.At 08:23 PM 9/15/05 -0700, [EMAIL PROTECTED] wrote:hi...i've got a php app, and i'm trying to figure out how/where to turn to to geta good working regex in order to determine if i have a valid email addressany help/thoughts/etc.. would be seriously helpful... i've come across a great many preg_match functions for php, but i haven'trun across one that works all the time!! Here's an article that discusses true validation: http://coveryourasp.com/ValidateEmail.asp -- pDale Quando Omni Flunkus Moritati. (When all else fails, play dead.) -- Red Green ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: regex expression to determine if i have a valid email!!
so mark... what you're saying is that i should write a simple perl app that does the email validation, and call it from the php app, passing it the email address to check -bruce -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Thomas, Mark - BLS CTR Sent: Friday, September 16, 2005 5:49 AM To: '[EMAIL PROTECTED]'; 'perl-win32-users' Subject: RE: regex expression to determine if i have a valid email!! i've got a php app Say it isn't so! :) i finally figured that someone here, might have the exact soln to my prob! 1. Make sure it is formatted correctly, as per RFC 822. To do that, there is one big scary regex created by Jeffrey Friedl (author of Mastering Regular Expressions). If you check the link that $Bill gave you to Email::Valid, the regular expression is in there. 2. Make sure the domain is valid, and resolves to an IP address 3. Make sure an MX record is defined for the domain, and the mail host is accessible. 4. (Optional) Check the ip against selected blacklist(s). Of course, the easiest way to do this is Perl. Email::Valid does 1-3 for you. - Mark. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: regex expression to determine if i have a valid email!!
Sure. For extra coolness, you could do this via Ajax so that it is validated on the fly while the user is still filling out the form. - Mark. -Original Message- From: bruce [mailto:[EMAIL PROTECTED] Sent: Friday, September 16, 2005 11:21 AM To: Thomas, Mark - BLS CTR; 'perl-win32-users' Subject: RE: regex expression to determine if i have a valid email!! so mark... what you're saying is that i should write a simple perl app that does the email validation, and call it from the php app, passing it the email address to check -bruce -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Thomas, Mark - BLS CTR Sent: Friday, September 16, 2005 5:49 AM To: '[EMAIL PROTECTED]'; 'perl-win32-users' Subject: RE: regex expression to determine if i have a valid email!! i've got a php app Say it isn't so! :) i finally figured that someone here, might have the exact soln to my prob! 1. Make sure it is formatted correctly, as per RFC 822. To do that, there is one big scary regex created by Jeffrey Friedl (author of Mastering Regular Expressions). If you check the link that $Bill gave you to Email::Valid, the regular expression is in there. 2. Make sure the domain is valid, and resolves to an IP address 3. Make sure an MX record is defined for the domain, and the mail host is accessible. 4. (Optional) Check the ip against selected blacklist(s). Of course, the easiest way to do this is Perl. Email::Valid does 1-3 for you. - Mark. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: regex expression to determine if i have a valid email!!
Title: RE: regex expression to determine if i have a valid email!! There is also the address checking _expression_ in the very nice Mail-Sendmail module. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of [EMAIL PROTECTED] Sent: Friday, September 16, 2005 14:05 To: perl-win32-users@listserv.ActiveState.com Subject: Perl-Win32-Users Digest, Vol 20, Issue 14 Send Perl-Win32-Users mailing list submissions to perl-win32-users@listserv.ActiveState.com To subscribe or unsubscribe via the World Wide Web, visit http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users or, via email, send a message with subject or body 'help' to [EMAIL PROTECTED] You can reach the person managing the list at [EMAIL PROTECTED] When replying, please edit your Subject line so it is more specific than Re: Contents of Perl-Win32-Users digest... ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
regex expression to determine if i have a valid email!!
hi... i've got a php app, and i'm trying to figure out how/where to turn to to get a good working regex in order to determine if i have a valid email address any help/thoughts/etc.. would be seriously helpful... i've come across a great many preg_match functions for php, but i haven't run across one that works all the time!! i finally figured that someone here, might have the exact soln to my prob! thanks -bruce [EMAIL PROTECTED] ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: regex expression to determine if i have a valid email!!
bruce wrote: : i've got a php app, and i'm trying to figure out how/where to : turn to to get a good working regex in order to determine if i : have a valid email address : : any help/thoughts/etc.. would be seriously helpful... : : i've come across a great many preg_match functions for php, but : i haven't run across one that works all the time!! I think this works if you remove comments from the address first. Email will probably screw it up. Search google for it. (?:(?:\r\n)?[ \t])*(?:(?:(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[ \t]))*(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[ \t]))*(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[ \t]))*(?:(?:\r\n)?[ \t])*)*\(?:(?:\r\n)?[ \t])*(?:@(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[ \t]))*(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[ \t]))*(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\(?:(?:\r\n)?[ \t])*)|(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[ \t]))*(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[ \t]))*(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[ \t]))*(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[ \t]))*(?:(?:\r\n)?[ \t])*)*\(?:(?:\r\n)?[ \t])*(?:@(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[ \t]))*(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[ \t]))*(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\(?:(?:\r\n)?[ \t])*)(?:,\s*(?:(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[ \t]))*(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[ \t]))*(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r
Re: regex expression to determine if i have a valid email!!
Well it very much depends on what u consider a valid email address. Because technically, anything can be valid in some context. What u probably want here is a fully qualified Internet mail address. The basic form of this would be m/[EMAIL PROTECTED]/. If u want to limit that to known legitimate MX's u can do DNS lookups on the domain part. At 08:23 PM 9/15/05 -0700, [EMAIL PROTECTED] wrote: hi... i've got a php app, and i'm trying to figure out how/where to turn to to get a good working regex in order to determine if i have a valid email address any help/thoughts/etc.. would be seriously helpful... i've come across a great many preg_match functions for php, but i haven't run across one that works all the time!! i finally figured that someone here, might have the exact soln to my prob! -- REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =-- ...ne cede malis 0100 ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs