Re: trouble doing regex in file containing both ascii and binary content

2014-02-15 Thread sisyphus1
Hi Greg,

This list is all but dead – it may be that you and me are the only people 
receiving mail from it.
Much better, IMO, to post these types of questions to perlmonks.

Anyway ... this might help:

#
use strict;
use warnings;

my $str = \x1F\x8B\x08;

print String contains: $str\n;

open WR, '', 'file.bin' or die $!;
binmode WR;
print WR $str;
close WR or die $!;

undef $/;

open RD, '', 'file.bin' or die $!;
binmode RD;
my $contents = RD;
close RD or die $!;

if($contents =~ /$str/){print ok 1\n}

# To safeguard against presence of
# metacharacters in $str:

if($contents =~ /\Q$str\E/){print ok 2\n}
##

Cheers,
Rob

From: Greg VisionInfosoft 
Sent: Saturday, February 15, 2014 9:41 AM
To: Perl-Win32-Users@listserv.ActiveState.com 
Subject: trouble doing regex in file containing both ascii and binary content
i cant figure out what im doing wrong here. 
i ran wireshark to monitor a small http client/server query/response.
point of exercise is to see exactly what an ajax response looks like (as im 
trying to learn ajax).

unfortunately, the ajax response is sent from server in 'gzip' format (not 
plain text).

so wireshark shows two standard http headers and at the end of the stream is 
the binary 'gzipped' small stream.

ive saved this wireshark tcp 'stream' to a file.  viewing the file in hex mode, 
i see clearly the first three binary bytes of the gzipped stream are hex1F 
hex8B hex08

what i need to do next is save just the binary gzipped stream to a stand alone 
file, then see if i can un-gzip it to read the plain text contents.

in theory, a straight forward task.

i write a quick few line perl script, whereby i open the saved wireshark tcp 
stream file, set this input file to binary mode (so as to not change any 
internal binary byte values), undefine the input line seperator (to upserp the 
entire file into memory when read), read the file to upserp its contents into a 
var, do a simple pattern match of \x1F\x8B\x08, then save the matched pattern 
$ and what follows the match $' to a new file... (right now the script doesnt 
actually yet output to a file, it just dumps to screen)

for reasons that elude me, the pattern match fails.

i know the 3 bytes are in the file, yet the pattern match to those 3 bytes 
fails.

any ideas?

heres the small script.

open(IN, $ARGV[0]) || die cant open input file;
binmode(IN);

undef $/;

my $data = IN;

if ($data =~ /\x1F\x8B\x08/) {
  print matched:  . $ . $';
} else {
  print no match\n;
}


the contents of the wireshark stream is as follows...


POST /ajax/demo_post.asp HTTP/1.1
Host: www.w3schools.com

Connection: keep-alive

Content-Length: 0

User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like 
Gecko) Chrome/32.0.1700.107 Safari/537.36

Origin: http://www.w3schools.com

Accept: */*

Referer: http://www.w3schools.com/ajax/tryajax_post.htm

Accept-Encoding: gzip,deflate,sdch

Accept-Language: en-US,en;q=0.8

Cookie: ASPSESSIONIDAASDBBTC=BFEPJKCDLGDHEEOJIKANOEHP



HTTP/1.1 200 OK
Cache-Control: private,public

Content-Type: text/html

Content-Encoding: gzip

Vary: Accept-Encoding

Server: Microsoft-IIS/7.5

X-Powered-By: ASP.NET

Date: Fri, 14 Feb 2014 21:03:48 GMT

Content-Length: 201


.`.I.%/m.{.J.J..t...`.$..@.iG#).*..eVe]f.@..{{;.N'...?\fd.l..J...!?~|.?...V.6_..U..u...y...t./_.I.y;.f..wWG.qBo..
..Q.www.~..h.../..h.c...


note; the binary data at end is obviously not easily discerned here in ascii 
mode.  when i open this same file in a binary editor the actual binary contents 
(displayed in hex) is as follows... (ive inserted an extra space to make the 
hex values be easily discerned).

1f 8b 08 00 00 00 00 00 04 00 ed bd 07 60 1c 49 96 25 26 2f 6d ca 7b 7f 4a f5 
4a d7 e0 74 a1 08 80 60 13 24 d8 90 40 10 ec c1 88 cd e6 92 ec 1d 69 47 23 29 
ab 2a 81 ca 65 56 65 5d 66 16 40 cc ed 9d bc f7 de 7b ef bd f7 de 7b ef bd f7 
ba 3b 9d 4e 27 f7 df ff 3f 5c 66 64 01 6c f6 ce 4a da c9 9e 21 80 aa c8 1f 3f 
7e 7c 1f 3f 22 1e af 8e de cc 8b 26 9d 56 cb 36 5f b6 e9 55 d6 a4 75 fe 8b d6 
79 d3 e6 b3 74 dd 14 cb 8b b4 9d e7 e9 cb 2f 5f bf 49 17 79 3b af 66 e3 c7 77 
57 47 bf 71 42 6f be b2 0d b3 f6 51 ba 77 77 77 ff ee de ce ee 7e ba ff 68 e7 
de a3 fd 87 e9 cb 2f d0 f4 ff 01 a8 9f 68 15 63 00 00 00





___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: trouble doing regex in file containing both ascii and binary content

2014-02-15 Thread Richie
I haven't had time to test my theory so I didn't respond.   Since he had 
said text and binary my thoughts where that the regex would not match 
past the first linefeed and would need to be updated accordingly.


On 2/15/2014 6:33 AM, sisyph...@optusnet.com.au wrote:

Hi Greg,
This list is all but dead -- it may be that you and me are the only 
people receiving mail from it.

Much better, IMO, to post these types of questions to perlmonks.
Anyway ... this might help:
#
use strict;
use warnings;
my $str = \x1F\x8B\x08;
print String contains: $str\n;
open WR, '', 'file.bin' or die $!;
binmode WR;
print WR $str;
close WR or die $!;
undef $/;
open RD, '', 'file.bin' or die $!;
binmode RD;
my $contents = RD;
close RD or die $!;
if($contents =~ /$str/){print ok 1\n}
# To safeguard against presence of
# metacharacters in $str:
if($contents =~ /\Q$str\E/){print ok 2\n}
##
Cheers,
Rob
*From:* Greg VisionInfosoft mailto:gai...@visioninfosoft.com
*Sent:* Saturday, February 15, 2014 9:41 AM
*To:* Perl-Win32-Users@listserv.ActiveState.com 
mailto:Perl-Win32-Users@listserv.activestate.com
*Subject:* trouble doing regex in file containing both ascii and 
binary content

i cant figure out what im doing wrong here.
i ran wireshark to monitor a small http client/server query/response.
point of exercise is to see exactly what an ajax response looks like 
(as im trying to learn ajax).
unfortunately, the ajax response is sent from server in 'gzip' format 
(not plain text).
so wireshark shows two standard http headers and at the end of the 
stream is the binary 'gzipped' small stream.
ive saved this wireshark tcp 'stream' to a file. viewing the file in 
hex mode, i see clearly the first three binary bytes of the gzipped 
stream are hex1F hex8B hex08
what i need to do next is save just the binary gzipped stream to a 
stand alone file, then see if i can un-gzip it to read the plain text 
contents.

in theory, a straight forward task.
i write a quick few line perl script, whereby i open the saved 
wireshark tcp stream file, set this input file to binary mode (so as 
to not change any internal binary byte values), undefine the input 
line seperator (to upserp the entire file into memory when read), read 
the file to upserp its contents into a var, do a simple pattern match 
of \x1F\x8B\x08, then save the matched pattern $ and what follows the 
match $' to a new file... (right now the script doesnt actually yet 
output to a file, it just dumps to screen)

for reasons that elude me, the pattern match fails.
i know the 3 bytes are in the file, yet the pattern match to those 3 
bytes fails.

any ideas?
heres the small script.
open(IN, $ARGV[0]) || die cant open input file;
binmode(IN);
undef $/;
my $data = IN;
if ($data =~ /\x1F\x8B\x08/) {
  print matched:  . $ . $';
} else {
  print no match\n;
}
the contents of the wireshark stream is as follows...
POST /ajax/demo_post.asp HTTP/1.1
Host: www.w3schools.com http://www.w3schools.com
Connection: keep-alive
Content-Length: 0
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36

Origin: http://www.w3schools.com
Accept: */*
Referer: http://www.w3schools.com/ajax/tryajax_post.htm
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Cookie: ASPSESSIONIDAASDBBTC=BFEPJKCDLGDHEEOJIKANOEHP
HTTP/1.1 200 OK
Cache-Control: private,public
Content-Type: text/html
Content-Encoding: gzip
Vary: Accept-Encoding
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET http://ASP.NET
Date: Fri, 14 Feb 2014 21:03:48 GMT
Content-Length: 201
.`.I.%/m.{.J.J..t...`.$..@.iG#).*..eVe]f.@..{{;.N'...?\fd.l..J...!?~|.?...V.6_..U..u...y...t./_.I.y;.f..wWG.qBo..
..Q.www.~..h.../..h.c...
note; the binary data at end is obviously not easily discerned here in 
ascii mode.  when i open this same file in a binary editor the actual 
binary contents (displayed in hex) is as follows... (ive inserted an 
extra space to make the hex values be easily discerned).
1f 8b 08 00 00 00 00 00 04 00 ed bd 07 60 1c 49 96 25 26 2f 6d ca 7b 
7f 4a f5 4a d7 e0 74 a1 08 80 60 13 24 d8 90 40 10 ec c1 88 cd e6 92 
ec 1d 69 47 23 29 ab 2a 81 ca 65 56 65 5d 66 16 40 cc ed 9d bc f7 de 
7b ef bd f7 de 7b ef bd f7 ba 3b 9d 4e 27 f7 df ff 3f 5c 66 64 01 6c 
f6 ce 4a da c9 9e 21 80 aa c8 1f 3f 7e 7c 1f 3f 22 1e af 8e de cc 8b 
26 9d 56 cb 36 5f b6 e9 55 d6 a4 75 fe 8b d6 79 d3 e6 b3 74 dd 14 cb 
8b b4 9d e7 e9 cb 2f 5f bf 49 17 79 3b af 66 e3 c7 77 57 47 bf 71 42 
6f be b2 0d b3 f6 51 ba 77 77 77 ff ee de ce ee 7e ba ff 68 e7 de a3 
fd 87 e9 cb 2f d0 f4 ff 01 a8 9f 68 15 63 00 00 00



___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

trouble doing regex in file containing both ascii and binary content

2014-02-14 Thread Greg VisionInfosoft
i cant figure out what im doing wrong here.
i ran wireshark to monitor a small http client/server query/response.
point of exercise is to see exactly what an ajax response looks like (as im
trying to learn ajax).

unfortunately, the ajax response is sent from server in 'gzip' format (not
plain text).

so wireshark shows two standard http headers and at the end of the stream
is the binary 'gzipped' small stream.

ive saved this wireshark tcp 'stream' to a file.  viewing the file in hex
mode, i see clearly the first three binary bytes of the gzipped stream are
hex1F hex8B hex08

what i need to do next is save just the binary gzipped stream to a stand
alone file, then see if i can un-gzip it to read the plain text contents.

in theory, a straight forward task.

i write a quick few line perl script, whereby i open the saved wireshark
tcp stream file, set this input file to binary mode (so as to not change
any internal binary byte values), undefine the input line seperator (to
upserp the entire file into memory when read), read the file to upserp its
contents into a var, do a simple pattern match of \x1F\x8B\x08, then save
the matched pattern $ and what follows the match $' to a new file...
(right now the script doesnt actually yet output to a file, it just dumps
to screen)

for reasons that elude me, the pattern match fails.

i know the 3 bytes are in the file, yet the pattern match to those 3 bytes
fails.

any ideas?

heres the small script.

open(IN, $ARGV[0]) || die cant open input file;
binmode(IN);

undef $/;

my $data = IN;

if ($data =~ /\x1F\x8B\x08/) {
  print matched:  . $ . $';
} else {
  print no match\n;
}


the contents of the wireshark stream is as follows...


POST /ajax/demo_post.asp HTTP/1.1
Host: www.w3schools.com
Connection: keep-alive
Content-Length: 0
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/32.0.1700.107 Safari/537.36
Origin: http://www.w3schools.com
Accept: */*
Referer: http://www.w3schools.com/ajax/tryajax_post.htm
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Cookie: ASPSESSIONIDAASDBBTC=BFEPJKCDLGDHEEOJIKANOEHP


HTTP/1.1 200 OK
Cache-Control: private,public
Content-Type: text/html
Content-Encoding: gzip
Vary: Accept-Encoding
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Fri, 14 Feb 2014 21:03:48 GMT
Content-Length: 201

.`.I.%/m.{.J.J..t...`.$..@.iG#).*..eVe]f.@
..{{;.N'...?\fd.l..J...!?~|.?...V.6_..U..u...y...t./_.I.y;.f..wWG.qBo..
..Q.www.~..h.../..h.c...


note; the binary data at end is obviously not easily discerned here in
ascii mode.  when i open this same file in a binary editor the actual
binary contents (displayed in hex) is as follows... (ive inserted an extra
space to make the hex values be easily discerned).

1f 8b 08 00 00 00 00 00 04 00 ed bd 07 60 1c 49 96 25 26 2f 6d ca 7b 7f 4a
f5 4a d7 e0 74 a1 08 80 60 13 24 d8 90 40 10 ec c1 88 cd e6 92 ec 1d 69 47
23 29 ab 2a 81 ca 65 56 65 5d 66 16 40 cc ed 9d bc f7 de 7b ef bd f7 de 7b
ef bd f7 ba 3b 9d 4e 27 f7 df ff 3f 5c 66 64 01 6c f6 ce 4a da c9 9e 21 80
aa c8 1f 3f 7e 7c 1f 3f 22 1e af 8e de cc 8b 26 9d 56 cb 36 5f b6 e9 55 d6
a4 75 fe 8b d6 79 d3 e6 b3 74 dd 14 cb 8b b4 9d e7 e9 cb 2f 5f bf 49 17 79
3b af 66 e3 c7 77 57 47 bf 71 42 6f be b2 0d b3 f6 51 ba 77 77 77 ff ee de
ce ee 7e ba ff 68 e7 de a3 fd 87 e9 cb 2f d0 f4 ff 01 a8 9f 68 15 63 00 00
00
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re:Problem with regex

2011-11-17 Thread Barry Brevik
Hi Barry,

On Thu, Nov 10, 2011 at 2:34 AM, Barry Brevik bbre...@stellarmicro.com wrote:

 Below is some test code that will be used in a larger program.

 In the code below I have a regular expression who's intent is to look 
 for   1 or more characters , 1 or more characters  and replace 
 the comma with |. (the white space is just for clarity).

 IAC, the regex works, that is, it matches, but it only replaces the 
 final match. I have just re-read the camel book section on regexes and 
 have tried many variations, but apparently I'm too close to it to see 
 what must be a simple answer.

 BTW, if you guys think I'm posting too often, please say so.

 Barry Brevik

 
 use strict;
 use warnings;

 my $csvLine = qq|  col , 1  ,  col___'2' ,  col-3, col,4|;

 print before comma substitution: $csvLine\n\n;

 $csvLine =~ s/(\x22.+),(.+\x22)/$1|$2/s;

 print after comma substitution.: $csvLine\n\n;


Tobias already gave you a solution and

I also think using Text::CSV or Text::CSV_XS is way better for this task thank 
plain regexes, For example one day you might encounter a line that has an 
embedded  escaped using \.

Then even if your regex worked earlier this can kill it.
And what if there was an | in the original string?
Nevertheless let me also try to explain the issue that you had with the regex 
as this can come up in other situations.

First, I'd probably use plain  instead of \x22 as that will be probably easier 
to the reader to know what are you looking for.

Second, the /s has probably no value at the end. That only changes the behavior 
of . to also match newlines.If you don't have newlines in your string (e.g. 
because you are processing a file line by line) then the /s has no effect. That 
makes this expression:

$csvLine =~ s/(.+),(.+)/$1|$2/;

Then, before going on you need to check what does this really match so I 
replaced the above with

if ($csvLine =~ s/(.+),(.+)/$1|$2/s ){
print match: $1$2\n;
}

and got

match: col , 1 , col___'2' , col-3, col4

You see, the .+ is greedy, it match from the first  as much as it could.
You'd be better of telling it to match as little as possible by adding an extra 
? after the quantifier.

if ($csvLine =~ /(.+?),(.+?)/ ){
print match: $1$2\n;
}

prints this:
match: col  1

Finally you need to do the substitution globally, so not only once but as many 
times as possible:
$csvLine =~ s/(.+?),(.+?)/$1|$2/g;

And the output is
after comma substitution.: col | 1 , col___'2' , col-3, col|4

But again, for CSV files that can have embedded, it is better to use one of the 
real CSV parsers.

regards

Gabor

--

Gabor Szabo

http://szabgab.com/perl_tutorial.html http://szabgab.com/perl_tutorial.html 

 

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re:Problem with regex

2011-11-17 Thread Barry Brevik
 Nevertheless let me also try to explain the issue that you had with 
 the regex as this can come up in other situations.

 First, I'd probably use plain  instead of \x22 as that will be 
 probably easier to the reader to know what are you looking for.

Wow. That is an incredible post.

Yes, I've been convinced to use Text::CSV, but for some reason the
ActiveState ppm does not actually install it. It complains about not
being able to find some other module that it depends on.

I'm amazed that you have executed the train of thought expressed in your
post. I have been doing Perl for 12 years but obviously have failed to
grasp some of the true power of more complicated expressions. I did
intuit that I needed to use ?, but did not do it the way you did, so it
did not work as expected.

I appreciate the time you spent compiling your post. Sometimes I feel
like I'm the only one posting questions to the list, and it alarms me
that the traffic is so low... I would really regret having this list go
dormant, as I have learned so much, especially from reading threads I
did not post. And the people are really friendly.

Barry Brevik
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Re:Problem with regex

2011-11-17 Thread Brian Raven
 -Original Message-
 From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl-
 win32-users-boun...@listserv.activestate.com] On Behalf Of Barry Brevik
 Sent: 17 November 2011 17:29
 To: perl Win32-users
 Subject: Re:Problem with regex

 ...

 Yes, I've been convinced to use Text::CSV, but for some reason the
 ActiveState ppm does not actually install it. It complains about not
 being able to find some other module that it depends on.

Don't know if it helps, but I seem to have installed Text::CSV_XS on 
Activestate 5.14.1 build 1401.


--
Brian Raven




Please consider the environment before printing this e-mail.

This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient or have received this e-mail in error, please advise 
the sender immediately by reply e-mail and delete this message and any 
attachments without retaining a copy.

Any unauthorised copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden.
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re:Problem with regex

2011-11-17 Thread Barry Brevik
 $csvLine =~ s/(.+?),(.+?)/$1|$2/g;

For some reason this substitution does not seem to work all the time,
depending on which fields have commas in them.

I finally tinkered my way into this:
  $csvLine =~ s/([^,]+?),([^,]+?)/$1|$2/g;

...which seems to work a little better, but will not deal with spaces
between fields, which are not supposed to be there anyway.

Barry Brevik


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Re:Problem with regex

2011-11-17 Thread Barry Brevik
 Don't know if it helps, but I seem to have installed 
 Text::CSV_XS on Activestate 5.14.1 build 1401.

I'm on Perl 5.8.8 because my Perl DevKit only works on that version.

Barry Brevik
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Problem with regex

2011-11-10 Thread Gabor Szabo
Hi Barry,

On Thu, Nov 10, 2011 at 2:34 AM, Barry Brevik bbre...@stellarmicro.com wrote:
 Below is some test code that will be used in a larger program.

 In the code below I have a regular expression who's intent is to look
 for   1 or more characters , 1 or more characters  and replace the
 comma with |. (the white space is just for clarity).

 IAC, the regex works, that is, it matches, but it only replaces the
 final match. I have just re-read the camel book section on regexes and
 have tried many variations, but apparently I'm too close to it to see
 what must be a simple answer.

 BTW, if you guys think I'm posting too often, please say so.

 Barry Brevik
 
 use strict;
 use warnings;

 my $csvLine = qq|  col , 1  ,  col___'2' ,  col-3, col,4|;

 print before comma substitution: $csvLine\n\n;

 $csvLine =~ s/(\x22.+),(.+\x22)/$1|$2/s;

 print after comma substitution.: $csvLine\n\n;


Tobias already gave you a solution and
I also think using Text::CSV or Text::CSV_XS is way better for this task
thank plain regexes, For example one day you might encounter
a line that has an embedded  escaped using \.
Then even if your regex worked  earlier this can kill it.
And what if there was an | in the original string?


Nevertheless let me also try to explain the issue that you had
with the regex as this can come up in other situations.

First, I'd probably use plain  instead of \x22 as that will be
probably easier to the reader to know what are you looking for.

Second, the /s has probably no value at the end. That only changes
the behavior of . to also match newlines.If you don't have newlines in
your string (e.g. because you are processing a file line by line)
then the /s has no effect. That makes this expression:

$csvLine =~ s/(.+),(.+)/$1|$2/;

Then, before going on you need to check what does this really match so
I replaced
the above with

 if ($csvLine =~ s/(.+),(.+)/$1|$2/s ){
   print match: $1$2\n;
 }

and got

match: col , 1  ,  col___'2' ,  col-3, col4

You see, the .+ is greedy, it match from the first  as much as it could.
You'd be better of telling it to match as little as possible by adding
an extra ? after the quantifier.
 if ($csvLine =~ /(.+?),(.+?)/ ){
   print match: $1$2\n;
 }

prints this:
match: col  1

Finally you need to do the substitution globally, so not only once but
as many times
as possible:

 $csvLine =~ s/(.+?),(.+?)/$1|$2/g;

And the output is

after comma substitution.:   col | 1  ,  col___'2' ,  col-3, col|4


But again, for CSV files that can have embedded, it is better to use
one of the real CSV parsers.

regards
  Gabor

-- 
Gabor Szabo
http://szabgab.com/perl_tutorial.html
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Problem with regex

2011-11-09 Thread Barry Brevik
Below is some test code that will be used in a larger program.

What I am trying to do is process lines from a CSV file where some of
the 'cells' have commas embedded in the (see sample code below). I might
have used text::CSV but as far as I can tell that module also can not
deal with embedded commas.

In the code below I have a regular expression who's intent is to look
for   1 or more characters , 1 or more characters  and replace the
comma with |. (the white space is just for clarity).

IAC, the regex works, that is, it matches, but it only replaces the
final match. I have just re-read the camel book section on regexes and
have tried many variations, but apparently I'm too close to it to see
what must be a simple answer.

BTW, if you guys think I'm posting too often, please say so.

Barry Brevik

use strict;
use warnings;
 
my $csvLine = qq|  col , 1  ,  col___'2' ,  col-3, col,4|;
 
print before comma substitution: $csvLine\n\n;
 
$csvLine =~ s/(\x22.+),(.+\x22)/$1|$2/s;
 
print after comma substitution.: $csvLine\n\n;

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Problem with regex

2011-11-09 Thread Tobias Hoellrich
The whitespaces around the separator characters are not allowed in strict CSV. 
Try this below.

Cheers - Tobias

use strict;
use warnings;
use Text::CSV;

my $csv = Text::CSV-new({ allow_whitespace = 1 });
open my $fh, DATA or die Can't access DATA: $!\n;
while (my $row = $csv-getline($fh)) {
print join(\n,@$row),\n;
}
$csv-eof or $csv-error_diag();

__END__
col , 1  ,  col___'2' ,  col-3, col,4

-Original Message-
From: perl-win32-users-boun...@listserv.activestate.com 
[mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Barry 
Brevik
Sent: Wednesday, November 09, 2011 5:35 PM
To: perl Win32-users
Subject: Problem with regex

Below is some test code that will be used in a larger program.

What I am trying to do is process lines from a CSV file where some of the 
'cells' have commas embedded in the (see sample code below). I might have used 
text::CSV but as far as I can tell that module also can not deal with embedded 
commas.

In the code below I have a regular expression who's intent is to look for   1 
or more characters , 1 or more characters  and replace the comma with |. 
(the white space is just for clarity).

IAC, the regex works, that is, it matches, but it only replaces the final 
match. I have just re-read the camel book section on regexes and have tried 
many variations, but apparently I'm too close to it to see what must be a 
simple answer.

BTW, if you guys think I'm posting too often, please say so.

Barry Brevik

use strict;
use warnings;
 
my $csvLine = qq|  col , 1  ,  col___'2' ,  col-3, col,4|;
 
print before comma substitution: $csvLine\n\n;
 
$csvLine =~ s/(\x22.+),(.+\x22)/$1|$2/s;
 
print after comma substitution.: $csvLine\n\n;

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Help with regex

2011-06-30 Thread Barry Brevik
I am trying to truncate a string so that it is only 39 characters long.
The application is a label printing routine, and the label is only long
enough to print 39 characters.
 
I tried this (and many iterations), but it returns the entire string
every time.
 
Can anyone see what I'm doing wrong, or maybe suggest a better way?
 
use strict;
use warnings;

my $txt = 'This is a string that is longer than thirty nine characters
used for testing.';
print \nRunning a test of grabbing the 1st 39 characters of a
string.\n;
print Test string.: $txt\n;
 
$txt =~ s/^(.{1,39})/$1/;
 
print Resulting string: $txt\n;

Barry Brevik
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Help with regex

2011-06-30 Thread Tobias Hoellrich
$txt =~ s/^(.{1,39}).*$/$1/;

or 

$txt = substr($txt,0,39);

--T

-Original Message-
From: perl-win32-users-boun...@listserv.activestate.com 
[mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Barry 
Brevik
Sent: Thursday, June 30, 2011 11:49 AM
To: perl-win32-users@listserv.ActiveState.com
Subject: Help with regex

I am trying to truncate a string so that it is only 39 characters long.
The application is a label printing routine, and the label is only long enough 
to print 39 characters.
 
I tried this (and many iterations), but it returns the entire string every time.
 
Can anyone see what I'm doing wrong, or maybe suggest a better way?
 
use strict;
use warnings;

my $txt = 'This is a string that is longer than thirty nine characters used for 
testing.'; print \nRunning a test of grabbing the 1st 39 characters of a 
string.\n; print Test string.: $txt\n;
 
$txt =~ s/^(.{1,39})/$1/;
 
print Resulting string: $txt\n;

Barry Brevik

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Help with regex

2011-06-30 Thread Joe Discenza
Barry,

: I am trying to truncate a string so that it is only 39 characters long.
: The application is a label printing routine, and the label is only long
: enough to print 39 characters.

Wrong tool. Look for substr.

Joe

Joseph Discenza
Senior Analyst/Software Developer 

  
1251 N. Eddy Street, Suite 202
South Bend, IN 46617- 1478
Phone: 574.243.6040 Ext. 233    
Fax:  574-243-6060

www.carletoninc.com
Visit our blog at:  carletoncompliance.blogspot.com

This email message is intended only for the addressee(s) and contains 
information that may be confidential and/or copyrighted.  If you are not the 
intended recipient, please notify the sender by reply email and immediately 
delete this email.  Use, disclosure or reproduction of this email by anyone 
other than the intended recipient(s) is strictly prohibited.

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Help with regex

2011-06-30 Thread Barry Brevik
Wow, thank you all for the many replies I received!!
 
Barry Brevik
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: can a regex pattern match return the starting position of the match?

2011-04-17 Thread ka_zim35
Please visit and sign this petition:
International Campaign for reconstruction of Buddha's Statues, in Bamiyan

http://www.thepetitionsite.com/38/international-campaign-for-reconstruction-of-buddhas-statues-in-bamiyan/



From: Conor conor.l...@gmail.com
To: gai...@visioninfosoft.com
Cc: perl-win32-users@listserv.activestate.com
Sent: Thursday, April 14, 2011 2:08 PM
Subject: Re: can a regex pattern match return the starting position of the 
match?


Greg-

This question was answered on Stack 
Overflow: http://stackoverflow.com/questions/87380/how-can-i-find-the-location-of-a-regex-match-in-perl

brian d foy's answer seems to be the best:

The built-in variables @- and @+ hold the start and end positions, 
respectively, of the last successful match. $-[0] and $+[0] correspond to 
entire pattern, while $-[N] and $+[N] correspond to the $N ($1, $2, etc.) 
submatches.

-Conor


On Thu, Apr 14, 2011 at 10:38 AM, Greg Aiken gai...@visioninfosoft.com wrote:

given how smart perl is, I was thinking there must be a
function within perl whereby if one does a pattern match against a scaler, that
in addition to having regex being able to return such built in vars as: $`
(what preceeds the match), $’ (what follows the match), $1, etc…  
 
is there a built in var that returns the position within the
scalar where the match occurred?
 
of course, if not, one may always evaluate length($`).
 
I was just curious
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs



___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


can a regex pattern match return the starting position of the match?

2011-04-14 Thread Greg Aiken
given how smart perl is, I was thinking there must be a function within perl
whereby if one does a pattern match against a scaler, that in addition to
having regex being able to return such built in vars as: $` (what preceeds
the match), $' (what follows the match), $1, etc.  

 

is there a built in var that returns the position within the scalar where
the match occurred?

 

of course, if not, one may always evaluate length($`).

 

I was just curious

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: can a regex pattern match return the starting position of the match?

2011-04-14 Thread Conor
Greg-

This question was answered on Stack Overflow:
http://stackoverflow.com/questions/87380/how-can-i-find-the-location-of-a-regex-match-in-perl

http://stackoverflow.com/questions/87380/how-can-i-find-the-location-of-a-regex-match-in-perlbrian
d foy's answer seems to be the best:

The built-in variables @- and @+ hold the start and end positions,
respectively, of the last successful match. $-[0] and $+[0] correspond to
entire pattern, while $-[N] and $+[N] correspond to the $N ($1, $2, etc.)
submatches.

-Conor

On Thu, Apr 14, 2011 at 10:38 AM, Greg Aiken gai...@visioninfosoft.comwrote:

  given how smart perl is, I was thinking there must be a function within
 perl whereby if one does a pattern match against a scaler, that in addition
 to having regex being able to return such built in vars as: $` (what
 preceeds the match), $’ (what follows the match), $1, etc…



 is there a built in var that returns the position within the scalar where
 the match occurred?



 of course, if not, one may always evaluate length($`).



 I was just curious

 ___
 Perl-Win32-Users mailing list
 Perl-Win32-Users@listserv.ActiveState.com
 To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: regex like option *values*

2011-03-07 Thread Brian Raven
 -Original Message-
 From: p sena [mailto:senapati2...@yahoo.com]
 Sent: 05 March 2011 05:34
 To: perl-win32-users@listserv.ActiveState.com; Brian Raven
 Subject: RE: regex like option *values*
 __DATA__
 abc0[1-9].ctr.[pad,spd].set.in
 abc[01-22].ctr.[pad,spd].set.in
 abcL[1,2,3].ctr.[pad,spd].set.in
 abcL[1,2,3].ctr.[pad,spd].set.in
 abcL[1,2,3].ctr.[70,001].set.in

 ---

 It should work for lists of ranges, and
 ranges of
 strings as well as
 numbers.

 Regarding incorporating into Getopt::Long,
 see the
 Tips and Tricks
 section of the doco.

 Brian,

 Can this solution be generalized in a way to support
 --option_value=abc0[1-9].ctr.[pad,spd].set.in,xxx0[2- 8].mmm.[rst,spd].
 afr.org types? Means those _DATA_ lines all appear in one line
 separated by comma as above (instead of newline separated). Should it
 be efficient to do in the expand_string() or from the main while
 iteration just before calling expand_string.

 Replying back with a solution I can see. In case of such option value
 supplies it becomes difficlut to do the similar thing as below-
 GetOptions (library=s = \@libfiles);
@libfiles = split(/,/,join(',',@libfiles));
 Such mixed strings can be parsed and returned as a list as below. In
 our context, to be called from the main before the while iteration.
 After that this list's elems can be passed on to the expand_xxx
 routine(s) one by one.

 # Arg- A string which is the option value like #abc0[1-
 9].ctr.[pad,spd].set.in,xxx0[2-8].mmm.[rst,spd].afr.org,some more
 values... sub parse_mix_strings {
 my @x = split (//, $_[0]);
 my $bracket_close;
 my $bracket_open;
 my @elems;
 my @hstrings;
 for (@x) {
 push @elems, $_;
 if ($_ eq '[') {
 $bracket_open = 1;
 }
 if ($_ eq ']') {
 if ($bracket_open == 1) {
 $bracket_close = 1;
 $bracket_open = 0;
 }
 }
 if ($_ eq ','  !$bracket_open  $bracket_close) {
 $elems[$#elems] =~ s/,//;
 push @hstrings, join(,@elems);
 @elems = ();
 }
 }
 push @hstrings, join(, @elems);
 return@hstrings;
 }

 On *another note* leveraging use of the Getopts::Long can be this way
 I think ?

 my %list;
 GetOptions('list=s%' =
   sub { print 1 = $_[1] 2 = $_[2]\n;
 push(@{$list{$_[1]}}, expand_string($_[2])) });

 print Elems = , scalar @{$list-{add}}, \n; # debug print  ,
 @{$list{add}}, \n; # debug skip

 And program can be called as - prog_name.pl --list add=abc0[1-
 2].src.spd.in --list add=volvo[1-5].jeep.sch.edu

Your first idea can be made simpler by choosing a different separator, as comma 
is already being used as a separator for the contents of your square brackets. 
A unique separator means that you only need to call split to get the individual 
strings that you want to expand.

Your second idea can also be simpler. For example...

my @list;
GetOptions('list=s' = sub {push @list, expand_string($_[1]);});

HTH


--
Brian Raven




Please consider the environment before printing this e-mail.

This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient or have received this e-mail in error, please advise 
the sender immediately by reply e-mail and delete this message and any 
attachments without retaining a copy.

Any unauthorised copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden.
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: regex like option *values*

2011-03-04 Thread p sena
  __DATA__
  abc0[1-9].ctr.[pad,spd].set.in
  abc[01-22].ctr.[pad,spd].set.in
  abcL[1,2,3].ctr.[pad,spd].set.in
  abcL[1,2,3].ctr.[pad,spd].set.in
  abcL[1,2,3].ctr.[70,001].set.in
 
 ---
 
  It should work for lists of ranges, and ranges of
 strings as well as
  numbers.
 
  Regarding incorporating into Getopt::Long, see the
 Tips and Tricks
  section of the doco.

Brian,

Can this solution be generalized in a way to support
--option_value=abc0[1-9].ctr.[pad,spd].set.in,xxx0[2-8].mmm.[rst,spd].afr.org 
types?
Means those _DATA_ lines all appear in one line separated by comma as above 
(instead of newline separated). Should it be efficient to do in the 
expand_string() or from the main while iteration just before calling 
expand_string.


  
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: regex like option *values*

2011-03-04 Thread p sena
   __DATA__
   abc0[1-9].ctr.[pad,spd].set.in
   abc[01-22].ctr.[pad,spd].set.in
   abcL[1,2,3].ctr.[pad,spd].set.in
   abcL[1,2,3].ctr.[pad,spd].set.in
   abcL[1,2,3].ctr.[70,001].set.in
  
  ---
  
   It should work for lists of ranges, and
 ranges of
  strings as well as
   numbers.
  
   Regarding incorporating into Getopt::Long,
 see the
  Tips and Tricks
   section of the doco.
 
 Brian,
 
 Can this solution be generalized in a way to support
 --option_value=abc0[1-9].ctr.[pad,spd].set.in,xxx0[2-8].mmm.[rst,spd].afr.org
 types?
 Means those _DATA_ lines all appear in one line separated
 by comma as above (instead of newline separated). Should it
 be efficient to do in the expand_string() or from the main
 while iteration just before calling expand_string.

Replying back with a solution I can see. In case of such option value supplies 
it becomes difficlut to do the similar thing as below-
GetOptions (library=s = \@libfiles);
   @libfiles = split(/,/,join(',',@libfiles));

Such mixed strings can be parsed and returned as a list as below. In our 
context, to be called from the main before the while iteration. After that this 
list's elems can be passed on to the expand_xxx routine(s) one by one.

# Arg- A string which is the option value like
#abc0[1-9].ctr.[pad,spd].set.in,xxx0[2-8].mmm.[rst,spd].afr.org,some more 
values...
sub parse_mix_strings {
my @x = split (//, $_[0]);
my $bracket_close;
my $bracket_open;
my @elems;
my @hstrings;
for (@x) {
push @elems, $_;
if ($_ eq '[') {
$bracket_open = 1;
}
if ($_ eq ']') {
if ($bracket_open == 1) {
$bracket_close = 1;
$bracket_open = 0;
}
}
if ($_ eq ','  !$bracket_open  $bracket_close) {
$elems[$#elems] =~ s/,//;
push @hstrings, join(,@elems);
@elems = ();
}

}
push @hstrings, join(, @elems);
return@hstrings;
}

On *another note* leveraging use of the Getopts::Long can be this way I think ?

my %list;
GetOptions('list=s%' =
  sub { print 1 = $_[1] 2 = $_[2]\n; 
push(@{$list{$_[1]}}, expand_string($_[2])) });

print Elems = , scalar @{$list-{add}}, \n; # debug
print  , @{$list{add}}, \n; # debug
skip

And program can be called as - prog_name.pl --list add=abc0[1-2].src.spd.in 
--list add=volvo[1-5].jeep.sch.edu


~TIA


  
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: regex like option *values*

2011-03-03 Thread Brian Raven
 -Original Message-
 From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl-
 win32-users-boun...@listserv.activestate.com] On Behalf Of p sena
 Sent: 02 March 2011 17:16
 To: perl-win32-users@listserv.ActiveState.com
 Subject: regex like option *values*

 Hi,

 I want to use option and values like:-

 --option_name abc0[1-9].ctr.{pad,spd}.set.in

 or --option_name abc[01-22].ctr.{pad,spd}.set.in

 or --option_name abcL{1,2,3}.ctr.{pad,spd}.set.in

 or --option_name abcL[1,2,3].ctr.{pad,spd}.set.in

 or --option_name abcL{1,2,3}.ctr.{70,001}.set.in

 etc possibilities. This should in fact expand those option values into
 the right number of values/quantities i,e; --option_name will hold
 multiple values. Instead of supplying values one after another I just
 want to club them in a regex like style. I am already using Getopt::Long.

 What could be best way to handle this type of passing option values?
 Is there any existing module for this ?

I could be wrong, but I doubt that an existing module would do what you want. 
Generating all possible strings that match a regex is hard in the general case, 
if not impossible.

However, if you limit the expressions you want to expand and simplify your 
syntax a bit, it's not too difficult. Here's a quick hack that, I think, does 
pretty much what you want.

---
use strict;
use warnings;

while (DATA) {
chomp;
print Expanding: $_\n;
my @result = expand_string($_);
print $_\n for @result;
}

# Expand string to array of strings based on lists  ranges in square
# brackets. Note recursion not strictly necessary, but it simplifies
# the code.
sub expand_string {
my $str = shift;
my @result;
if ($str =~ /^(.*?)\[([^]]+)\](.*)$/) {
my ($pre, $post) = ($1, $3);
my @bits = expand_list($2);
foreach my $bit (@bits) {
push @result, expand_string($pre$bit$post);
}
}
else {
push @result, $str;
}
return @result;
}

# Return array from comma separated list of strings and ranges.
sub expand_list {
my @vals = split /\s*,\s*/, $_[0];
my @result;
foreach my $v (@vals) {
if ($v =~ /^([^-]+)-([^-]+)$/) {
push @result, eval '$1'..'$2';
die $@ if $@;
}
else {
push @result, $v;
}
}
return @result;
}

__DATA__
abc0[1-9].ctr.[pad,spd].set.in
abc[01-22].ctr.[pad,spd].set.in
abcL[1,2,3].ctr.[pad,spd].set.in
abcL[1,2,3].ctr.[pad,spd].set.in
abcL[1,2,3].ctr.[70,001].set.in
---

It should work for lists of ranges, and ranges of strings as well as numbers.

Regarding incorporating into Getopt::Long, see the Tips and Tricks section of 
the doco.

HTH


--
Brian Raven




Please consider the environment before printing this e-mail.

This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient or have received this e-mail in error, please advise 
the sender immediately by reply e-mail and delete this message and any 
attachments without retaining a copy.

Any unauthorised copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden.
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: regex like option *values*

2011-03-03 Thread p sena
 __DATA__
 abc0[1-9].ctr.[pad,spd].set.in
 abc[01-22].ctr.[pad,spd].set.in
 abcL[1,2,3].ctr.[pad,spd].set.in
 abcL[1,2,3].ctr.[pad,spd].set.in
 abcL[1,2,3].ctr.[70,001].set.in
 ---
 
 It should work for lists of ranges, and ranges of strings
 as well as numbers.
 
 Regarding incorporating into Getopt::Long, see the Tips and
 Tricks section of the doco.
 
 HTH 
 --
 Brian Raven

Thanks Brian,

This solution should work only for brackets irrespective of numbers or strings 
inside them right? The curly braces are not required it seems.

This feature is not there in Getopt::Long and can this be implemented in it or 
it is configurable from it?

Thanks.



  
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: regex like option *values*

2011-03-03 Thread Brian Raven
 -Original Message-
 From: p sena [mailto:senapati2...@yahoo.com]
 Sent: 03 March 2011 15:40
 To: perl-win32-users@listserv.ActiveState.com; Brian Raven
 Subject: RE: regex like option *values*
 __DATA__
 abc0[1-9].ctr.[pad,spd].set.in
 abc[01-22].ctr.[pad,spd].set.in
 abcL[1,2,3].ctr.[pad,spd].set.in
 abcL[1,2,3].ctr.[pad,spd].set.in
 abcL[1,2,3].ctr.[70,001].set.in
 ---

 It should work for lists of ranges, and ranges of strings as well as
 numbers.

 Regarding incorporating into Getopt::Long, see the Tips and Tricks
 section of the doco.

 HTH
 --
 Brian Raven

 Thanks Brian,

 This solution should work only for brackets irrespective of numbers or
 strings inside them right? The curly braces are not required it seems.

 This feature is not there in Getopt::Long and can this be implemented
 in it or it is configurable from it?

As I said, see 'perldoc Getopt::Long'. A small change to the suggestion in 
Tips and Techniques would look like..

GetOptions('option_name=s%' =
   sub { push(@{$list{$_[1]}}, expand_string($_[2])) });

I haven't tried it but it looks like it should work.

HTH


--
Brian Raven




Please consider the environment before printing this e-mail.

This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient or have received this e-mail in error, please advise 
the sender immediately by reply e-mail and delete this message and any 
attachments without retaining a copy.

Any unauthorised copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden.
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


regex like option *values*

2011-03-02 Thread p sena

Hi,

I want to use option and values like:-

--option_name abc0[1-9].ctr.{pad,spd}.set.in

or --option_name abc[01-22].ctr.{pad,spd}.set.in
 
or --option_name abcL{1,2,3}.ctr.{pad,spd}.set.in

or --option_name abcL[1,2,3].ctr.{pad,spd}.set.in

or --option_name abcL{1,2,3}.ctr.{70,001}.set.in

etc possibilities. This should in fact expand those option values into the 
right number of values/quantities i,e; --option_name will hold multiple values. 
Instead of supplying values one after another I just want to club them in a 
regex like style. I am already using Getopt::Long.

What could be best way to handle this type of passing option values? Is there 
any existing module for this ?

~TIA


  
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Re: Perl Regex

2010-03-26 Thread Mark Bergeron

Does this have to be hard coded in to the script? Just wondering since I have been kinda following this thread.Feb 26, 2010 01:56:02 PM, perl-win32-users-boun...@listserv.activestate.com wrote:
It looks like what u want to do is attribute folding. That's when u take anested XML tag and make it an attribute of an enclosing tag. Ur doingsomething slightly different which is merging equal depth tags. The rightway to do this is with an XML parser. Look into XML::Simple to get started.U would read in the XML to a hash, manipulate the data in the hash, and thenwrite out a new XML file.Regex can do this in a degenerate case but it becomes unmanageable fast.But since u asked$xml =~s{(\s*)([^]*)\s*([^]*)eId(\s*)}{$1<INDEX-ENTRYpages="$3"$2$4}sg;HTHAt 09:25 PM 2/26/2010 +0530, Kprasad wrote:Hi AllWhat will be the perfect Regular _expression_ to convert below mentioned'Search Text' to 'Replacement Text' while 'Single Line' option is ON.When I use below mentioned Regex<index-entry(?:[^>]+)?((?!\/index-entry).*?)\s*([0-9]+)And replaces wronglyarousal disorders<SEEhref="" label="see"disorders of arousal.Search Text:APOE e4 variant 18arousal disorders label="see"disorders of arousalarterial blood gas tests 32asthma 28--9, 295Correct Replacement Text should be:APOE e4 variantarousal disorders label="see"disorders of arousalarterial blood gas testsasthma--REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--"...ne cede malis"0100___Perl-Win32-Users mailing listPerl-Win32-Users@listserv.ActiveState.comTo unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Perl Regex

2010-02-28 Thread Jon Bjornstad
Yes, you could use an XML parser to do the job described below
but this case is pretty simple.   Here's my offering
leaving out the reading/writing of the files.

--
my $s = EOF;
index-item
index-entryAPOE e4 variant/index-entry pageId18/pageId
/index-item
index-item
index-entryarousal disorders/index-entry see href=c-86679-1  
label=seedisorders of arousal/see
/index-item
index-item
index-entryarterial blood gas tests/index-entry pageId32/pageId
/index-item
index-item
index-entryasthma/index-entry pageId28--9, 295/pageId
/index-item
EOF

$s =~ s{index-entry(.*?)/index-entry\s*pageId(.*?)/pageId}
{index-entry pages=$2$1/index-entry}g;

print $s;
--

You could replace the two .*? with [^]* if you wanted to be more  
precise
but it looks more confusing.

Jon

==  original query 

Hi All

What will be the perfect Regular Expression to convert below mentioned  
'Search Text' to 'Replacement Text' while 'Single Line' option is ON.

When I use below mentioned Regex
index-entry(?:[^]+)?((?!\/index-entry).*?)/index-entry 
\s*pageId([0-9]+)/pageId

And replaces wrongly

index-entry pages=32arousal disorders/index-entrysee  
href=c-86679-1 label=seedisorders of arousal/see
/index-item
.

Search Text:

index-item
index-entryAPOE e4 variant/index-entry pageId18/pageId
/index-item
index-item
index-entryarousal disorders/index-entry see href=c-86679-1  
label=seedisorders of arousal/see
/index-item
index-item
index-entryarterial blood gas tests/index-entry pageId32/pageId
/index-item
index-item
index-entryasthma/index-entry pageId28--9, 295/pageId
/index-item

Correct Replacement Text should be:

index-item
index-entry pages=18APOE e4 variant/index-entry
/index-item
index-item
index-entryarousal disorders/index-entry see href=c-86679-1  
label=seedisorders of arousal/see
/index-item
index-item
index-entry pages=32arterial blood gas tests/index-entry
/index-item
index-item
index-entry pages=28--29,295asthma/index-entry
/index-item

Kanhaiya

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Perl Regex

2010-02-26 Thread Kprasad
Hi All

What will be the perfect Regular Expression to convert below mentioned 'Search 
Text' to 'Replacement Text' while 'Single Line' option is ON.

When I use below mentioned Regex
index-entry(?:[^]+)?((?!\/index-entry).*?)/index-entry\s*pageId([0-9]+)/pageId

And replaces wrongly

index-entry pages=32arousal disorders/index-entrysee href=c-86679-1 
label=seedisorders of arousal/see
/index-item
.

Search Text:

index-item
index-entryAPOE e4 variant/index-entry pageId18/pageId
/index-item
index-item
index-entryarousal disorders/index-entry see href=c-86679-1 
label=seedisorders of arousal/see
/index-item
index-item
index-entryarterial blood gas tests/index-entry pageId32/pageId
/index-item
index-item
index-entryasthma/index-entry pageId28--9, 295/pageId
/index-item

Correct Replacement Text should be:

index-item
index-entry pages=18APOE e4 variant/index-entry
/index-item
index-item
index-entryarousal disorders/index-entry see href=c-86679-1 
label=seedisorders of arousal/see
/index-item
index-item
index-entry pages=32arterial blood gas tests/index-entry
/index-item
index-item
index-entry pages=28--29,295asthma/index-entry
/index-item

Kanhaiya___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Perl Regex

2010-02-26 Thread Brian Raven

From: perl-win32-users-boun...@listserv.activestate.com
[mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of
Kprasad
Sent: 26 February 2010 15:56
To: perl-win32-users@listserv.ActiveState.com
Subject: Perl Regex

 Hi All
  
 What will be the perfect Regular Expression to convert below mentioned
'Search Text' to 'Replacement 
 Text' while 'Single Line' option is ON.
  
 When I use below mentioned Regex

index-entry(?:[^]+)?((?!\/index-entry).*?)/index-entry\s*pageId
([0-9]+)/pageId
  
 And replaces wrongly

I think it is going to be hard to be of much help. Mostly because you
don't show us any Perl.

First, a regular expression can't change anything, it can only match.

Second, I find it easier to work out what is going on with non-trivial
regular expressions if I use the 'x' switch, which allows me to break
the RE over multiple lines, and include comments. Particularly useful
with the 'qr' quoting operator. Your RE, for example, might look like
this.

my $re=qr{index-entry(?:[^]+)?
  ((?!\/index-entry).*?)
  /index-entry
  \s*
  pageId
  ([0-9]+)
  /pageId
  }x;

However, as you don't provide any information on how that RE is used,
its going to be difficult to say what might be going wrong. If you could
provide a small example script, that we could cut  paste  run, it
would make it much easier.

Finally, your data looks a lot like XML. A dedicated parser will
generally do a more reliable job of parsing XML that regular
expressions, even Perl regular expressions.

HTH

-- 
Brian Raven 

Please consider the environment before printing this email.

This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient or have received this e-mail in error, please advise 
the sender immediately by reply e-mail and delete this message and any 
attachments without retaining a copy.

Any unauthorised copying, disclosure or distribution of the material in this 
e-mail is strictly forbidden.

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Perl Regex

2010-02-26 Thread Chris Wagner
It looks like what u want to do is attribute folding.  That's when u take a
nested XML tag and make it an attribute of an enclosing tag.  Ur doing
something slightly different which is merging equal depth tags.  The right
way to do this is with an XML parser.  Look into XML::Simple to get started.
U would read in the XML to a hash, manipulate the data in the hash, and then
write out a new XML file.

Regex can do this in a degenerate case but it becomes unmanageable fast.
But since u asked

$xml =~
s{index-item(\s*)index-entry([^]*)/index-entry\s*pageId([^]*)/pag
eId(\s*)/index-item}{index-item$1index-entry
pages=$3$2/index-entry$4/index-item}sg;

HTH


At 09:25 PM 2/26/2010 +0530, Kprasad wrote:
Hi All

What will be the perfect Regular Expression to convert below mentioned
'Search Text' to 'Replacement Text' while 'Single Line' option is ON.

When I use below mentioned Regex
index-entry(?:[^]+)?((?!\/index-entry).*?)/index-entry\s*pageId([0
-9]+)/pageId

And replaces wrongly

index-entry pages=32arousal disorders/index-entrysee
href=c-86679-1 label=seedisorders of arousal/see
/index-item
.

Search Text:

index-item
index-entryAPOE e4 variant/index-entry pageId18/pageId
/index-item
index-item
index-entryarousal disorders/index-entry see href=c-86679-1
label=seedisorders of arousal/see
/index-item
index-item
index-entryarterial blood gas tests/index-entry pageId32/pageId
/index-item
index-item
index-entryasthma/index-entry pageId28--9, 295/pageId
/index-item

Correct Replacement Text should be:

index-item
index-entry pages=18APOE e4 variant/index-entry
/index-item
index-item
index-entryarousal disorders/index-entry see href=c-86679-1
label=seedisorders of arousal/see
/index-item
index-item
index-entry pages=32arterial blood gas tests/index-entry
/index-item
index-item
index-entry pages=28--29,295asthma/index-entry
/index-item



--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--
...ne cede malis

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Perl Regex

2010-02-26 Thread Kprasad
Here is the chunk of code which I used to perform this task:

 open(XML, $ARGV[0]) or die Can not open $ARGV[0]: $!;
 my $xmltext;
 {
  local $/ = undef;
  $xmltext=XML;
 }
 close(XML);
 while($xmltext=~ 
/index-entry(?:[^]+)?(?:.*?)\/index-entry(?:[^\n]*?)pageId([^]+)\/pageId/is)
 {
  $page=$2;
  $page=~ s/ *\n+\t+/ /g;
  $page=~ s/, /,/g;
  $xmltext=~ 
s|index-entry(?:[^]+)?(.*?)/index-entry(?:[^\n]*?)pageId[^]+/pageId|index-entry
 
chid=$1 pages=$page$2/index-entry|s
 }
 $xmltext=~ s/index-entry chid=/index-entry id=/;
 open(XMLOUT, $localpath/$xmlfile\_final.xml) or die Can not open 
$localpath/$xmlfile\_final.xml: $!;
 print XMLOUT $xmltext;
 close(XMLOUT);

Thanks
Kanhaiya

- Original Message - 
From: Brian Raven bra...@nyx.com
To: perl-win32-users@listserv.ActiveState.com
Sent: Friday, February 26, 2010 10:22 PM
Subject: RE: Perl Regex



 From: perl-win32-users-boun...@listserv.activestate.com
 [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of
 Kprasad
 Sent: 26 February 2010 15:56
 To: perl-win32-users@listserv.ActiveState.com
 Subject: Perl Regex

 Hi All

 What will be the perfect Regular Expression to convert below mentioned
 'Search Text' to 'Replacement
 Text' while 'Single Line' option is ON.

 When I use below mentioned Regex

 index-entry(?:[^]+)?((?!\/index-entry).*?)/index-entry\s*pageId
 ([0-9]+)/pageId

 And replaces wrongly

 I think it is going to be hard to be of much help. Mostly because you
 don't show us any Perl.

 First, a regular expression can't change anything, it can only match.

 Second, I find it easier to work out what is going on with non-trivial
 regular expressions if I use the 'x' switch, which allows me to break
 the RE over multiple lines, and include comments. Particularly useful
 with the 'qr' quoting operator. Your RE, for example, might look like
 this.

 my $re=qr{index-entry(?:[^]+)?
   ((?!\/index-entry).*?)
   /index-entry
   \s*
   pageId
   ([0-9]+)
   /pageId
  }x;

 However, as you don't provide any information on how that RE is used,
 its going to be difficult to say what might be going wrong. If you could
 provide a small example script, that we could cut  paste  run, it
 would make it much easier.

 Finally, your data looks a lot like XML. A dedicated parser will
 generally do a more reliable job of parsing XML that regular
 expressions, even Perl regular expressions.

 HTH

 -- 
 Brian Raven

 Please consider the environment before printing this email.

 This e-mail may contain confidential and/or privileged information. If you 
 are not the intended recipient or have received this e-mail in error, 
 please advise the sender immediately by reply e-mail and delete this 
 message and any attachments without retaining a copy.

 Any unauthorised copying, disclosure or distribution of the material in 
 this e-mail is strictly forbidden.

 ___
 Perl-Win32-Users mailing list
 Perl-Win32-Users@listserv.ActiveState.com
 To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

 

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


More efficient regex

2007-02-28 Thread Chris O

Gurus,

In the sample below, I'm checking $foo for all caps or all lowercase. Is
there a more efficient regex method?

- Chris



$foo='APPLE JONES PARKER';

if(($foo!~/[A-Z]/)or($foo!~/[a-z]/)){
$foo=title_case($foo);
}

print $foo.\n;




sub title_case{
  my($string) = @_;
  my @exception_words = ('A', 'The', 'If', 'Is', 'It', 'Of', 'Our',
'An','On', 'In', 'But', 'With', 'Has', 'Had', 'Have');
  my @exception_stuff = ('N','S','E','W','NE','NW','SE','SW','PO','BOX');
  
  $string =~ s/([\w']+)/\u\L$1/g;
  foreach(@exception_words){$string =~ s/\b$_\b/lc($_)/ge;} # Make Exception
Words LC
  foreach(@exception_stuff){$string =~ s/\b$_\b/$_/gei;} # Make Exception
Stuff Correct Case
  
  $string =~ s/(.)/\u$1/; # Uppercase the first letter
  return $string;
}

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: More efficient regex

2007-02-28 Thread Chris Wagner
I'm guessing that u want to cannocalize the capitalization of string words
right?  Like BOB JONES - Bob Jones.  There is a faster way to check for
mixed casedness.

%tolowers = map {$_, 1} ('A', 'The', 'If', 'Is', 'It', 'Of', 'Our',
'An','On', 'In', 'But', 'With', 'Has', 'Had', 'Have');
%touppers = map {$_, 1} ('N','S','E','W','NE','NW','SE','SW','PO','BOX');
$uppers = $text =~ tr/A-Z/A-Z/; #count uppercase letters
$lowers = $text =~ tr/a-z/a-z/; #count lowercase letters

if ($uppers and not $lowers) { #all upper case
fixcase($text);
}
elsif ($lowers and not $uppers) { #all lower case
fixcase($text);
}

sub fixcase {
my $text = $_[0];
my @text = map {ucfirst(lc($_))} split / /, $text;
foreach $i (@text) { $tolowers{$i} and $i = lc $i; }
foreach $i (@text) { $touppers{uc $i} and $i = uc $i; }
$text = join  , @text;
# do whatever else
return $text;
}

That should do it and be about as efficient as possible. :)  If u have to
deal with sentences then u'll need a few more lines to deal with periods and
commas.

These O'Reilly gems are useful too.
Finding all-caps words 
@capwords = m/(\b[^\Wa-z0-9_]+\b)/g;
Finding all-lowercase words 
@lowords = m/(\b[^\WA-Z0-9_]+\b)/g;
Finding initial-caps word 
@icwords = m/(\b[^\Wa-z0-9_][^\WA-Z0-9_]*\b)/;

At 12:12 PM 2/28/2007 -0500, Chris O wrote:
In the sample below, I'm checking $foo for all caps or all lowercase. Is
there a more efficient regex method?

$foo='APPLE JONES PARKER';

if(($foo!~/[A-Z]/)or($foo!~/[a-z]/)){
   $foo=title_case($foo);
}

print $foo.\n;

sub title_case{
  my($string) = @_;
  my @exception_words = ('A', 'The', 'If', 'Is', 'It', 'Of', 'Our',
'An','On', 'In', 'But', 'With', 'Has', 'Had', 'Have');
  my @exception_stuff = ('N','S','E','W','NE','NW','SE','SW','PO','BOX');
  
  $string =~ s/([\w']+)/\u\L$1/g;
  foreach(@exception_words){$string =~ s/\b$_\b/lc($_)/ge;} # Make Exception
Words LC
  foreach(@exception_stuff){$string =~ s/\b$_\b/$_/gei;} # Make Exception
Stuff Correct Case
  
  $string =~ s/(.)/\u$1/; # Uppercase the first letter
  return $string;
}



--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--
...ne cede malis

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Regex: Remove A HREF tag

2006-10-29 Thread eyal edri
Hi,I'm trying to use regex to strip an HTML text from the A HREF tag... but keep the http link intact.Here is an example:Here is the dirty text:$dirty = Check A HREF=
"" style="color: rgb(255, 0, 0); font-weight: bold;">\http://www.opengroup.org/cde/\
 TARGET=\_blank\The Open Group's Web site/A for updates.
P For Solaris/Sun OS, use A HREF="" style="color: rgb(255, 0, 0); font-weight: bold;">\http://www.securityfocus.com/archive/1/358426
\ TARGET=\_blank\ this workaround/A
for protecting the 'dtlogin' service from remote access /A. Sun also released a patch available at A HREF="" style="color: rgb(255, 0, 0); font-weight: bold;">\http://su
nsolve.sun.com/search/document.do?assetkey=1-26-57539-1\ TARGET=
\_blank\Sun Alert 57539/A.;** 
'\' were added to regard the  (dobule qoutes) as text.
here is how the text should look like:$clean = Check [http://www.opengroup.org/cde/]
 {The Open Group's Web site} for updates. For Solaris/Sun OS, use 
[http://www.securityfocus.com/archive/1/358426]
 this workaround for protecting the 'dtlogin' service from remote access. Sun also released a patch available at 
[http://sunsolve.sun.com/search/document.do?assetkey=1-26-57539-1]
 {Sun Alert 57539};
i'm using this to remove any HTML tags, but it also removes the HREF tags: # remove all HTML TAGS
 $solution =~ s/[^]*//gs; 
  # remove all escape chars like gt  quot
 $solution =~ s/gt;//gs; $solution =~ s/quot;//gs; 
Can you help?-- Eyal Edri | System  Security Engineer| [EMAIL PROTECTED] Communication.
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex: Remove A HREF tag

2006-10-29 Thread Chris Wagner
First convert the links then strip the html tags.

$text =~ s/a .*?href=?(.+?)?.*?(.+?)/a/[$1] {$2}/ig;






--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--
...ne cede malis

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Perl -pi -e 'regex'

2006-05-24 Thread Adam R. Frielink
I've had some trouble with a commandline syntax for a string search and
replace using a perl command line.

My cmd line was:

perl -pi -e 's!\xae!\\#169!g' list of files

This did not replace the occurace hex EA

The script for what that above cmd line should compile into (according
to the Camel Book) is:

__BEGIN__
#!perl.exe
$extension = '*';
LINE: while () {
if ($ARGV ne $oldargv) {
if ($extension !~ /\*/) {
$backup = $ARGV . $extension;
}
else {
($backup = $extension) =~ s/\*/$ARGV/g;
}
rename($ARGV, $backup);
open(ARGVOUT, $ARGV);
select(ARGVOUT);
$oldargv = $ARGV;
}
s/\xae/\\#169\;/g;
}
continue {
print;  # this prints to original filename
}
select(STDOUT);
__END__

Running this actual script works, but not the commandline version.  Did
I not escape the commandline properly?

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Perl -pi -e 'regex'

2006-05-24 Thread Lyle Kopnicky

Adam R. Frielink wrote:

I've had some trouble with a commandline syntax for a string search and
replace using a perl command line.

My cmd line was:

perl -pi -e 's!\xae!\\#169!g' list of files

This did not replace the occurace hex EA

The script for what that above cmd line should compile into (according
to the Camel Book) is:
  

...

s/\xae/\\#169\;/g;

  

...

Running this actual script works, but not the commandline version.  Did
I not escape the commandline properly?
  

Looks like you forgot the \; in the command line.

--
Lyle Kopnicky
Software Project Engineer
Veicon Technology, Inc.

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Perl -pi -e 'regex'

2006-05-24 Thread $Bill Luebkert
Adam R. Frielink wrote:

 I've had some trouble with a commandline syntax for a string search and
 replace using a perl command line.
 
 My cmd line was:
 
 perl -pi -e 's!\xae!\\#169!g' list of files

I tried this on tcsh and cmd.exe.  cmd.exe needs s instead of 's and
tcsh doesn't like ! and Perl wants a backup ext :

perl -pi.bak -e s{\xae}{#169}g foo

 This did not replace the occurace hex EA

You mean AE ?

 The script for what that above cmd line should compile into (according
 to the Camel Book) is:
 
 __BEGIN__
 #!perl.exe
 $extension = '*';
 LINE: while () {
 if ($ARGV ne $oldargv) {
 if ($extension !~ /\*/) {
 $backup = $ARGV . $extension;
 }
 else {
 ($backup = $extension) =~ s/\*/$ARGV/g;
 }
 rename($ARGV, $backup);
 open(ARGVOUT, $ARGV);
 select(ARGVOUT);
 $oldargv = $ARGV;
 }
 s/\xae/\\#169\;/g;
 }
 continue {
 print;  # this prints to original filename
 }
 select(STDOUT);
 __END__
 
 Running this actual script works, but not the commandline version.  Did
 I not escape the commandline properly?
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


AW: Problem with regex

2006-05-14 Thread Holger Wöhle
 use strict;
 use warnings;
 
 my $Data = 'Hello, i am a litte String.$ Please format me.$$$ 
 I am the end of the String.$$ And i am the last!';
 
 $Data =~ s/([^\$]*)\${3,3}([^\$]+)/$1\br\\br\$2/gm;
 $Data =~ s/([^\$]*)\${2,2}([^\$]+)/$1\p\$2/gm;
 $Data =~ s/([^\$]*)\${1,1}([^\$]+)/$1\br\$2/gm;
 print Data: $Data \n;
 
 ___END___
 
 Notice, I change the double quotes to single quotes for $Data.
 For me, the regex is clear. But if not for you, I can explain.
 There are maybe some better solution, this is just a quick one.
 

Hello,
First of all, many thanks for our quick and helpfully replies.
I tried Karl-Heinz's solution and it works very good. 
Karl-Heinz: Yes the regex is clear to me, the solution with $1  $2 was a 
good idea 

regards
Holgi

p.s. next time i should first take the Owls with me in the bath tub ;-) 




___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Problem with regex

2006-05-12 Thread Holger Wöhle
 
Hello,
under Windows with ActiveState Perl i have a strange problem with a regex:
Assuming the following String:

my $Data = Hello, i am a litte String.$ Please format me.$$$ I am the end
of the String.$$ And i am the last!

The regex should replace $ with the string br, $$ with p and $$$ with
brbr (please don't think about the why)

If tried to use the following:
$data =~ s/\$\$\$/brbr/gm; #should catch every occurrence of $$$
$data =~ s/\$\$/p/gm; #should catch $$
$data =~ s/\$/br/gm; #the rest

So data should look after the first regex:
Hello, i am a litte String.$Please format me.brbrI am the end of the
String.$$And i am the last!
And after the second:
Hello, i am a litte String.$Please format me.brbrI am the end of the
String.pAnd i am the last!
And the last:
Hello, i am a litte String.brPlease format me.brbrI am the end of the
String.pAnd i am the last!

But all regexes i tried (the one above are only one try) failed! When i
print out the string it looks like:

Hello, i am a litte String. Please format me. I am the end of the
String.3398 And i am the last!

Where the number after String. differs between every run.

Can someone help me ?

With regars
Holger

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Problem with regex

2006-05-12 Thread Karl-Heinz Kuth

Hello,



my $Data = Hello, i am a litte String.$ Please format me.$$$ I am the end
of the String.$$ And i am the last!

The regex should replace $ with the string br, $$ with p and $$$ with
brbr (please don't think about the why)

If tried to use the following:
$data =~ s/\$\$\$/brbr/gm; #should catch every occurrence of $$$
$data =~ s/\$\$/p/gm; #should catch $$
$data =~ s/\$/br/gm; #the rest

So data should look after the first regex:
Hello, i am a litte String.$Please format me.brbrI am the end of the
String.$$And i am the last!
And after the second:
Hello, i am a litte String.$Please format me.brbrI am the end of the
String.pAnd i am the last!
And the last:
Hello, i am a litte String.brPlease format me.brbrI am the end of the
String.pAnd i am the last!

But all regexes i tried (the one above are only one try) failed! When i
print out the string it looks like:

Hello, i am a litte String. Please format me. I am the end of the
String.3398 And i am the last!

Where the number after String. differs between every run.

Can someone help me ?


This works at least on my machine:

use strict;
use warnings;

my $Data = 'Hello, i am a litte String.$ Please format me.$$$ I am the 
end of the String.$$ And i am the last!';


$Data =~ s/([^\$]*)\${3,3}([^\$]+)/$1\br\\br\$2/gm;
$Data =~ s/([^\$]*)\${2,2}([^\$]+)/$1\p\$2/gm;
$Data =~ s/([^\$]*)\${1,1}([^\$]+)/$1\br\$2/gm;
print Data: $Data \n;

___END___

Notice, I change the double quotes to single quotes for $Data.
For me, the regex is clear. But if not for you, I can explain.
There are maybe some better solution, this is just a quick one.

Regards
Karl-Heinz


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Problem with regex

2006-05-12 Thread Yekhande, Seema \(MLITS\)
Holger,

Actually $ is a special character in string in perl. So, if the $ is there in 
the input,
you will have to always write it with the leading escape character. 

So, make your input will be like this,
my $data = Hello, i am a litte String.$ Please format me.$$$ I am the end
of the String.$$ And i am the last!;

It will solve your problem.

Thanks,
Seema
GPCT|TDDS|AIS|SPCM3


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Holger Wöhle
Sent: Friday, May 12, 2006 6:09 PM
To: perl-win32-users@listserv.ActiveState.com
Subject: Problem with regex 


 
Hello,
under Windows with ActiveState Perl i have a strange problem with a regex:
Assuming the following String:

my $Data = Hello, i am a litte String.$ Please format me.$$$ I am the end
of the String.$$ And i am the last!

The regex should replace $ with the string br, $$ with p and $$$ with
brbr (please don't think about the why)

If tried to use the following:
$data =~ s/\$\$\$/brbr/gm; #should catch every occurrence of $$$
$data =~ s/\$\$/p/gm; #should catch $$
$data =~ s/\$/br/gm; #the rest

So data should look after the first regex:
Hello, i am a litte String.$Please format me.brbrI am the end of the
String.$$And i am the last!
And after the second:
Hello, i am a litte String.$Please format me.brbrI am the end of the
String.pAnd i am the last!
And the last:
Hello, i am a litte String.brPlease format me.brbrI am the end of the
String.pAnd i am the last!

But all regexes i tried (the one above are only one try) failed! When i
print out the string it looks like:

Hello, i am a litte String. Please format me. I am the end of the
String.3398 And i am the last!

Where the number after String. differs between every run.

Can someone help me ?

With regars
Holger

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


If you are not an intended recipient of this e-mail, please notify the sender, 
delete it and do not read, act upon, print, disclose, copy, retain or 
redistribute it. Click here for important additional terms relating to this 
e-mail. http://www.ml.com/email_terms/


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Problem with regex

2006-05-12 Thread Andy Speagle
Holger,

This worked for me note that you need to escape the $ characters in your string. The 3398 numberis actually the PID of the perl process returned from the special variable $$ ... since you didn't escape the $ characters..


my $Data = "" i am a litte String.\$ Please format me.\$\$\$ I am the endof the String.\$\$ And i am the last!;
$Data =~ s/[\$]{3}/brbr/;$Data =~ s/[\$]{2}/p/;$Data =~ s/\$/br/;
print $Data .\n;
Hope that helps...

Andy Speagle

-
On 5/12/06, Holger Wöhle [EMAIL PROTECTED] wrote:
Hello,under Windows with ActiveState Perl i have a strange problem with a regex:Assuming the following String:
my $Data = "" i am a litte String.$ Please format me.$$$ I am the endof the String.$$ And i am the last!The regex should replace $ with the string br, $$ with p and $$$ with
brbr (please don't think about the why)If tried to use the following:$data =~ s/\$\$\$/brbr/gm; #should catch every occurrence of data =~ s/\$\$/p/gm; #should catch $$
$data =~ s/\$/br/gm; #the restSo data should look after the first regex:Hello, i am a litte String.$Please format me.brbrI am the end of theString.$$And i am the last!And after the second:
Hello, i am a litte String.$Please format me.brbrI am the end of theString.pAnd i am the last!And the last:Hello, i am a litte String.brPlease format me.brbrI am the end of the
String.pAnd i am the last!But all regexes i tried (the one above are only one try) failed! When iprint out the string it looks like:Hello, i am a litte String. Please format me. I am the end of the
String.3398 And i am the last!Where the number after String. differs between every run.Can someone help me ?With regarsHolger___Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.comTo unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Problem with regex

2006-05-12 Thread John Deighan

At 09:47 AM 5/12/2006, Yekhande, Seema \(MLITS\) wrote:

Holger,

Actually $ is a special character in string in perl. So, if the $ is 
there in the input,

you will have to always write it with the leading escape character.

So, make your input will be like this,
my $data = Hello, i am a litte String.$ Please format me.$$$ I am the end
of the String.$$ And i am the last!;

It will solve your problem.


$ is only special in strings with double quote marks (  ) around 
them. I think you meant to say:


my $data = Hello, i am a little String.\$ Please format me.\$\$\$ I 
am the end of the String.\$\$ And i am the last!;


That works, but, you can also use:

my $data = 'Hello, i am a little String.$ Please format me.$$$ I am 
the end of the String.$$ And i am the last!';


(Note the type of quote mark used)

If you were to print out the original string data like this:

my $data = Hello, i am a litte String.$ Please format me.$$$ I am 
the end of the String.$$ And i am the last!;

print($data\n);

you would get this:

Hello, i am a litte String. format me. I am the end of the 
String.1896 And i am the last!


i.e., the original string did not have any '$' characters in it at all.

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex Needed

2006-03-25 Thread DZ-Jay

Unpack is even faster, for fixed-format strings.

dZ.

On Mar 24, 2006, at 22:19, Chris Wagner wrote:


At 10:38 AM 3/24/2006 -0700, Paul Rousseau wrote:

  I am looking for help on a regex that examines strings such as

xxxN yyy sssNNN
xxxN yyyNyyy sss
xxxN yyyNyyy ssN

and returns only the sss part?  N is always a numeral, and s is always
alphabetic.


Do u have to examine those as fixed strings or as variable strings?  
Meaning
do u know ahead of time which format ur looking at.  If u don't know 
the

format ahead of time then u should use the regex.  But if u do know the
format ahead of time (like it never changes for one application) then u
shouldn't use a regex.  Using substr will be faster.

xxxN yyy sssNNN
$s = substr $string, 13, 3;
xxxN yyyNyyy sss
$s = substr $string, 13, 3;
xxxN yyyNyyy ssN
$s = substr $string, 13, 6;

#don't know what format $string will be
$s = $string =~ m/\S+ \S+ ([a-z])+/i;






--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--
...ne cede malis

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs



___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Regex Needed

2006-03-24 Thread Paul Rousseau

Hello,

  I am looking for help on a regex that examines strings such as

xxxN yyy sssNNN
xxxN yyyNyyy sss
xxxN yyyNyyy ssN

and returns only the sss part?  N is always a numeral, and s is always 
alphabetic.


Here is what I have so far as an example. I believe there is an eloquent way 
to do this in a single regex.


my (
 $string,
 $prefix
);

$string = MBH1 WELL PIT050;
($prefix) = $string =~    # I want $prefix to equal PIT

Thank you.


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Regex Needed

2006-03-24 Thread Joe Discenza
Paul Rousseau wrote, on Friday, March 24, 2006 12:38 PM

:I am looking for help on a regex that examines strings such as
: 
: xxxN yyy sssNNN
: xxxN yyyNyyy sss
: xxxN yyyNyyy ssN
: 
: and returns only the sss part?  N is always a numeral, and s 
: is always alphabetic.

Does /.*(\d+)/ do what you want? Or is there more to the string after
what you've shown?

Good luck,

Joe

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Regex Needed

2006-03-24 Thread Wagner, David --- Senior Programmer Analyst --- WGO
[EMAIL PROTECTED] wrote:
 Hello,
 
I am looking for help on a regex that examines strings such as
 
 xxxN yyy sssNNN
 xxxN yyyNyyy sss
 xxxN yyyNyyy ssN
 
 and returns only the sss part?  N is always a numeral, and s is always
 alphabetic.
 
 Here is what I have so far as an example. I believe there is an
 eloquent way to do this in a single regex.
 
 my (
   $string,
   $prefix
  );
 
 $string = MBH1 WELL PIT050;
 ($prefix) = $string =~    # I want $prefix to
 equal PIT 
if it is really of that format then /\s(\D+)\d+$/ is one shot which 
looks for a space followed by NON Numeric and then alpha and then end of line 
or data.

Wags ;)
 
 Thank you.
 
 
 ___
 Perl-Win32-Users mailing list
 Perl-Win32-Users@listserv.ActiveState.com
 To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs



***
This message contains information that is confidential
and proprietary to FedEx Freight or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.
***


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Regex Needed

2006-03-24 Thread Jerry Kassebaum

$string = MBH1 WELL PIT050;
$string =~ s/.* (.*?)\d+/\1/;   # Questionmark makes it 
non-greedy
($prefix) = $string; # Didn't figure out how 
to do ($prefix) = $string =~

print $prefix;
;



**

Hello,

 I am looking for help on a regex that examines strings such as

xxxN yyy sssNNN
xxxN yyyNyyy sss
xxxN yyyNyyy ssN

and returns only the sss part?  N is always a numeral, and s is always 
alphabetic.


Here is what I have so far as an example. I believe there is an eloquent way 
to do this in a single regex.


my (
$string,
$prefix
   );

$string = MBH1 WELL PIT050;
($prefix) = $string =~    # I want $prefix to equal PIT

Thank you.


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex Needed

2006-03-24 Thread Brian H. Oak

Paul,

Give this a shot:

/^\w+\s+\w+\s+([A-Za-z]+)\d+/

A regex should be as explicit and exclusive as possible, so I would 
remove the lowercase (a-z) portion of the character class if you know 
for sure that the letters you want will always be uppercase.


-Brian

_
Brian H. Oak   CISSP CISA
Acorn Networks  Security
http://acornnetsec.com/



Hello,

  I am looking for help on a regex that examines strings such as

xxxN yyy sssNNN
xxxN yyyNyyy sss
xxxN yyyNyyy ssN

and returns only the sss part?  N is always a numeral, and s is 
always alphabetic.


Here is what I have so far as an example. I believe there is an 
eloquent way to do this in a single regex.


my (
 $string,
 $prefix
);

$string = MBH1 WELL PIT050;
($prefix) = $string =~    # I want $prefix to equal PIT

Thank you.



___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex Needed

2006-03-24 Thread
Try this:
  $string =~ /^.{3}\d\s[^\s]+\s([a-zA-Z]+)\d+$/;
  $prefix = $1;

That should match:
  - any three characters at the beginning of the string: ^.{3}
  - followed by a number: \d
  - followed by whitespace:  \s
  - followed by any one or more characters until the next whitespace [^\s]+
  - followed by whitespace: \s
  - grab all the following characters that are letters: ([a-zA-Z]+)
  - followed by 1 or more numbers until the end of the string: \d+$.

Is that an accurate description?

 -dZ.

- Original Message -
From: Paul Rousseau
Sent: 3/24/2006 1:38:07 PM
To: Perl-Win32-Users@listserv.ActiveState.com
Subject: Regex Needed

 Hello,
 
I am looking for help on a regex that examines strings such as
 
 xxxN yyy sssNNN
 xxxN yyyNyyy sss
 xxxN yyyNyyy ssN
 
 and returns only the sss part?  N is always a numeral, and s is always 
 alphabetic.
 
 Here is what I have so far as an example. I believe there is an eloquent way 
 to do this in a single regex.
 
 my (
   $string,
   $prefix
  );
 
 $string = MBH1 WELL PIT050;
 ($prefix) = $string =~    # I want $prefix to equal PIT
 
 Thank you.
 
 
 ___
 Perl-Win32-Users mailing list
 Perl-Win32-Users@listserv.ActiveState.com
 To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex Needed

2006-03-24 Thread David Legault

Something like this :

/(\w\s){2}([a-zA-Z]+)\d*/

David

Joe Discenza wrote:

Paul Rousseau wrote, on Friday, March 24, 2006 12:38 PM

:I am looking for help on a regex that examines strings such as
: 
: xxxN yyy sssNNN

: xxxN yyyNyyy sss
: xxxN yyyNyyy ssN
: 
: and returns only the sss part?  N is always a numeral, and s 
: is always alphabetic.


Does /.*(\d+)/ do what you want? Or is there more to the string after
what you've shown?

Good luck,

Joe

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex Needed

2006-03-24 Thread Chris Wagner
At 10:38 AM 3/24/2006 -0700, Paul Rousseau wrote:
   I am looking for help on a regex that examines strings such as

xxxN yyy sssNNN
xxxN yyyNyyy sss
xxxN yyyNyyy ssN

and returns only the sss part?  N is always a numeral, and s is always 
alphabetic.

Do u have to examine those as fixed strings or as variable strings?  Meaning
do u know ahead of time which format ur looking at.  If u don't know the
format ahead of time then u should use the regex.  But if u do know the
format ahead of time (like it never changes for one application) then u
shouldn't use a regex.  Using substr will be faster.

xxxN yyy sssNNN
$s = substr $string, 13, 3;
xxxN yyyNyyy sss
$s = substr $string, 13, 3;
xxxN yyyNyyy ssN
$s = substr $string, 13, 6;

#don't know what format $string will be
$s = $string =~ m/\S+ \S+ ([a-z])+/i;






--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--
...ne cede malis

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Compiled regex error?

2006-02-21 Thread Chaddaï Fouché

$Bill Luebkert a écrit :

Maurice Height wrote:

  

I have just solved a bug in my code involving a compiled regex.
I am wondering if I have got it wrong or if this is a Perl error.
To explain...

I had a class ABC in which I passed a value to the constructor
(eg: $arg{delim_str} ) and stored this for later use throughout the class:

   $self-{DELIM_RE} = qr/\Q$arg{delim_str}\E/os;

Now when I create 2 class objects, each with a different value of
$arg{delim_str}, the first instance works correctly, but the second
seems to be using the same value that was created in the first instance.



To my understanding, that's what it's supposed to do.  The /o says
you don't have to re-interpolate the contents of $arg{delim_str}
after the first time.  So just remove the /o and you should be fine.

  

For example:

   my $abc1 = ABC-new( delim_str = q{|} ); 
   # do some stuff with $abc1

   

   my $abc2 = ABC-new( delim_str = q{,} );
   # do some stuff with $abc2
   *** does not work because the value of $self-{DELIM_RE}
   *** used in $abc2 is the same as that in $abc1

However if I remove the 'o' option from the regex, everything is OK.

   $self-{DELIM_RE} = qr/\Q$arg{delim_str}\E/s;

I had assumed that even though a regex is compiled once only with the 'o'
option, that instances of class data would be INDEPENDENT of each other.

There are two solutions to get a regex not to recompile itself for every 
use : either you put a /o and then the regex is compiled only one time 
in a run (as it was correctly the case in your script), or you use 
qr/.../ to get a compiled regex in a scalar and then you match against 
it : then the regex will be compiled only when you evaluate the qr/.../ 
statement. You should not mix the two solutions as you did.


--
Jedaï

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Compiled regex error?

2006-02-20 Thread Maurice Height
I have just solved a bug in my code involving a compiled regex.
I am wondering if I have got it wrong or if this is a Perl error.
To explain...

I had a class ABC in which I passed a value to the constructor
(eg: $arg{delim_str} ) and stored this for later use throughout the class:

   $self-{DELIM_RE} = qr/\Q$arg{delim_str}\E/os;

Now when I create 2 class objects, each with a different value of
$arg{delim_str}, the first instance works correctly, but the second
seems to be using the same value that was created in the first instance.

For example:

   my $abc1 = ABC-new( delim_str = q{|} ); 
   # do some stuff with $abc1
   

   my $abc2 = ABC-new( delim_str = q{,} );
   # do some stuff with $abc2
   *** does not work because the value of $self-{DELIM_RE}
   *** used in $abc2 is the same as that in $abc1

However if I remove the 'o' option from the regex, everything is OK.

   $self-{DELIM_RE} = qr/\Q$arg{delim_str}\E/s;

I had assumed that even though a regex is compiled once only with the 'o'
option, that instances of class data would be INDEPENDENT of each other.

Any comments welcome...

Maurice 


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Compiled regex error?

2006-02-20 Thread $Bill Luebkert
Maurice Height wrote:

 I have just solved a bug in my code involving a compiled regex.
 I am wondering if I have got it wrong or if this is a Perl error.
 To explain...
 
 I had a class ABC in which I passed a value to the constructor
 (eg: $arg{delim_str} ) and stored this for later use throughout the class:
 
$self-{DELIM_RE} = qr/\Q$arg{delim_str}\E/os;
 
 Now when I create 2 class objects, each with a different value of
 $arg{delim_str}, the first instance works correctly, but the second
 seems to be using the same value that was created in the first instance.

To my understanding, that's what it's supposed to do.  The /o says
you don't have to re-interpolate the contents of $arg{delim_str}
after the first time.  So just remove the /o and you should be fine.

 For example:
 
my $abc1 = ABC-new( delim_str = q{|} ); 
# do some stuff with $abc1

 
my $abc2 = ABC-new( delim_str = q{,} );
# do some stuff with $abc2
*** does not work because the value of $self-{DELIM_RE}
*** used in $abc2 is the same as that in $abc1
 
 However if I remove the 'o' option from the regex, everything is OK.
 
$self-{DELIM_RE} = qr/\Q$arg{delim_str}\E/s;
 
 I had assumed that even though a regex is compiled once only with the 'o'
 option, that instances of class data would be INDEPENDENT of each other.

Apparently not a good assumption.

perlretut man page:

Part 1: The basics
...
  Using regular expressions in Perl
...
There are a few more things you might want to know about matching operators.
First, we pointed out earlier that variables in regexps are substituted
before the regexp is evaluated:

$pattern = 'Seuss';
while () {
print if /$pattern/;
}

This will print any lines containing the word Seuss. It is not as
efficient as it could be, however, because perl has to re-evaluate $pattern
each time through the loop. If $pattern won't be changing over the lifetime
of the script, we can add the //o modifier, which directs perl to only
perform variable substitutions once:

#!/usr/bin/perl
#Improved simple_grep
$regexp = shift;
while () {
print if /$regexp/o;  # a good deal faster
}

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Yet another regex question

2006-01-17 Thread Thomas, Mark - BLS CTR
 I'd like to thank everybody who came up with suggestions.  
 One thing I 
 forgot to point out is that there are also people with 
 whitespace in their 
 *given* names, which seems to make things even more problematic 

I've updated my solution to accommodate that:

while (DATA) {
my @cols =
m/^(\d+)  #id1
  \s(\(\d+\)) #id2
  \s([\w ]+), #lastnames
  \s([^\d]+)  #first name
  \s([\d\.]+) #data1
  \s([\d\.]+) #data2
  \s([\d\.]+) #data3
  \s([\d\.]+) #data4
  \s(\w+) #country code
  \s([\d\.]+) #data5
/x;
printf %s\n, join \t, @cols;

}

__DATA__
1 (1) DAVENPORT, LINDSAY 3380.00 16 .00 49.00 USA .00
2 (2) CLIJSTERS, KIM 3206.00 17 .00 .00 BEL .00
28 (28) MOLIK, ALICIA 671.00 15 .00 195.00 AUS .00
29 (33) MEDINA GARRIGUES, ANABEL 660.75 27 30.00 10.00 ESP 2.00
30 (35) KOUKALOVA, KLARA 660.75 23 16.00 20.00 CZE 2.00
77 (84) MONTOYA, INIGO CONQUISTADOR 100.22 23 16.00 20.00 ESP 2.00


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Yet another regex question

2006-01-12 Thread Ted Schuerzinger
I'm fairly good at using regexes to find things, but using them to  
*replace* things is something I find quite difficult.  I have a text file  
with lines like this:


snip

1 (1) DAVENPORT, LINDSAY 3380.00 16 .00 49.00 USA .00
2 (2) CLIJSTERS, KIM 3206.00 17 .00 .00 BEL .00
[...]
28 (28) MOLIK, ALICIA 671.00 15 .00 195.00 AUS .00
29 (33) MEDINA GARRIGUES, ANABEL 660.75 27 30.00 10.00 ESP 2.00
30 (35) KOUKALOVA, KLARA 660.75 23 16.00 20.00 CZE 2.00

/snip

that I want to turn into a tab-delimited file.  Unfortunately, I can't  
simply turn all spaces into tabs: note that there are people with two-word  
surnames.  However, I'm having difficulty coming up with any ideas on how  
to find information on either side of a space, and changing that into  
something plus a tab.  I tried the following to deal with the country  
codes:


while (FILEFROM) {
 chomp;
 if ($_ =~/\d\s[A-Z]{3}\s/) {
 $_ = s/$1/$1\t/g;
 }
 print FILETO $_\n;
}

But all I got was a file with a bunch of lines of 1's.  I tried escaping  
the $'s, figuring it wouldn't help, and it didn't: I ended up with an  
empty file.  I also tried putting () around each of the $'s, and that gave  
me an even odder file, with each line containing a two-digit number, with  
no relationship I can spot between the numbers and the country codes: USA  
produces 54, 51, and 52 the first three times it's matched, while RUS  
produces 50, 54, 54, and 56 the first four times it's matched.


I've tried reading perlre, and it's given me no help.  I don't know where  
to begin.


--
Ted Schuerzinger, [EMAIL PROTECTED]

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Yet another regex question

2006-01-12 Thread Артем Аветисян
If you want tab instead of space after each country code, try this:

while (FILEFROM) {
  if (/\d\s[A-Z]{3}\s/) {
s/(\d\s[A-Z]{3})\s/$1\t/g;
  }
  print FILETO $_;
}


 
 I'm fairly good at using regexes to find things, but using them to  
 *replace* things is something I find quite difficult.  I have a text file  
 with lines like this:
 
 snip
 
 1 (1) DAVENPORT, LINDSAY 3380.00 16 .00 49.00 USA .00
 2 (2) CLIJSTERS, KIM 3206.00 17 .00 .00 BEL .00
 [...]
 28 (28) MOLIK, ALICIA 671.00 15 .00 195.00 AUS .00
 29 (33) MEDINA GARRIGUES, ANABEL 660.75 27 30.00 10.00 ESP 2.00
 30 (35) KOUKALOVA, KLARA 660.75 23 16.00 20.00 CZE 2.00
 
 /snip
 
 that I want to turn into a tab-delimited file.  Unfortunately, I can't  
 simply turn all spaces into tabs: note that there are people with two-word  
 surnames.  However, I'm having difficulty coming up with any ideas on how  
 to find information on either side of a space, and changing that into  
 something plus a tab.  I tried the following to deal with the country  
 codes:
 
 while (FILEFROM) {
   chomp;
   if ($_ =~/\d\s[A-Z]{3}\s/) {
   $_ = s/$1/$1\t/g;
   }
   print FILETO $_\n;
 }
 
 But all I got was a file with a bunch of lines of 1's.  I tried escaping  
 the $'s, figuring it wouldn't help, and it didn't: I ended up with an  
 empty file.  I also tried putting () around each of the $'s, and that gave  
 me an even odder file, with each line containing a two-digit number, with  
 no relationship I can spot between the numbers and the country codes: USA  
 produces 54, 51, and 52 the first three times it's matched, while RUS  
 produces 50, 54, 54, and 56 the first four times it's matched.
 
 I've tried reading perlre, and it's given me no help.  I don't know where  
 to begin.
 
 -- 
 Ted Schuerzinger, [EMAIL PROTECTED]
 
 ___
 Perl-Win32-Users mailing list
 Perl-Win32-Users@listserv.ActiveState.com
 To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
 


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


AW: Yet another regex question

2006-01-12 Thread Dietmar Fiehn, Dr.
Probably not the best but a working solution is to split the string using 
split, find out wether there are to much fields, consider they are 
two-or-more-word names, join the corresponding name-fields with a space and the 
overall with tabs.
 
Dietmar
 
--- snip ---
 
I'm fairly good at using regexes to find things, but using them to 
*replace* things is something I find quite difficult.  I have a text file 
with lines like this:

snip

1 (1) DAVENPORT, LINDSAY 3380.00 16 .00 49.00 USA .00
2 (2) CLIJSTERS, KIM 3206.00 17 .00 .00 BEL .00
[...]
28 (28) MOLIK, ALICIA 671.00 15 .00 195.00 AUS .00
29 (33) MEDINA GARRIGUES, ANABEL 660.75 27 30.00 10.00 ESP 2.00
30 (35) KOUKALOVA, KLARA 660.75 23 16.00 20.00 CZE 2.00

/snip

that I want to turn into a tab-delimited file.  Unfortunately, I can't 
simply turn all spaces into tabs: note that there are people with two-word 
surnames.  However, I'm having difficulty coming up with any ideas on how 
to find information on either side of a space, and changing that into 
something plus a tab.  I tried the following to deal with the country 
codes:

while (FILEFROM) {
  chomp;
  if ($_ =~/\d\s[A-Z]{3}\s/) {
  $_ = s/$1/$1\t/g;
  }
  print FILETO $_\n;
}

But all I got was a file with a bunch of lines of 1's.  I tried escaping 
the $'s, figuring it wouldn't help, and it didn't: I ended up with an 
empty file.  I also tried putting () around each of the $'s, and that gave 
me an even odder file, with each line containing a two-digit number, with 
no relationship I can spot between the numbers and the country codes: USA 
produces 54, 51, and 52 the first three times it's matched, while RUS 
produces 50, 54, 54, and 56 the first four times it's matched.

I've tried reading perlre, and it's given me no help.  I don't know where 
to begin.

--
Ted Schuerzinger, [EMAIL PROTECTED]

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Yet another regex question

2006-01-12 Thread Joe Discenza
Title: Yet another regex question






Ted Schuerzinger 
wrote, on Thu 12-Jan-06 08:45: I have a text filewith lines like 
this::: 1 (1) DAVENPORT, LINDSAY 3380.00 16 .00 49.00 USA .00: 2 (2) 
CLIJSTERS, KIM 3206.00 17 .00 .00 BEL .00: [...]: 28 (28) MOLIK, ALICIA 
671.00 15 .00 195.00 AUS .00: 29 (33) MEDINA GARRIGUES, ANABEL 660.75 27 
30.00 10.00 ESP 2.00: 30 (35) KOUKALOVA, KLARA 660.75 23 16.00 20.00 CZE 
2.00: that I want to turn into a tab-delimited file.Unfortunately, 
I can'tsimply
:turn all 
spaces into tabs: note that there arepeople with two-wordsurnames.

Part of the problem with this 
code

: if ($_ 
=~/\d\s[A-Z]{3}\s/) {: $_ = s/$1/$1\t/g;: }
is you have 
no capturing parentheses to populate $1. Toss this code.

You seem to have a 
pretty good picture of your data; why not turn that into a regex completely, 
instead of doing it piecemeal?

/(\d+)\s+\((\d+)\)\s+([A-Z\s]+),\s+([A-Z]+)\s+(\S+)\s+(\S+)\s+(\S+)\s+([A-Z]{3})\s+(\S+)/

and have a replace section 
that strings together all your captures with tabs between:

s/.../$1\t$2\t$3\t$4\t$5\t$6\t$7\t$8\t$9\t${10}/

You don't need the 
parentheses around field 2, or the comma after the last name, do you? If so, you 
can put those inside the captures.

Good 
luck,

Joe

== 
Joseph P. Discenza, Sr. 
Programmer/Analyst 
mailto:[EMAIL PROTECTED] 
Carleton Inc. http://www.carletoninc.com 
574.243.6040 ext. 300 fax: 574.243.6060Providing 
Financial Solutions and Compliance for over 30 Years



___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Yet another regex question

2006-01-12 Thread Chris Wagner
At 08:45 AM 1/12/2006 -0500, [EMAIL PROTECTED] wrote:
1 (1) DAVENPORT, LINDSAY 3380.00 16 .00 49.00 USA .00
2 (2) CLIJSTERS, KIM 3206.00 17 .00 .00 BEL .00
[...]
28 (28) MOLIK, ALICIA 671.00 15 .00 195.00 AUS .00
29 (33) MEDINA GARRIGUES, ANABEL 660.75 27 30.00 10.00 ESP 2.00
30 (35) KOUKALOVA, KLARA 660.75 23 16.00 20.00 CZE 2.00


print join \t, $line =~ m/^(\d+) \((\d+)\) ([a-zA-Z ]+), (\w+) ([^ ]+) ([^
]+) ([^ ]+) ([^ ]+) ([^ ]+) ([^ ]+)$/;






--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--
...ne cede malis

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Yet another regex question

2006-01-12 Thread Thomas, Mark - BLS CTR
 
 I'm fairly good at using regexes to find things, but using them to  
 *replace* things is something I find quite difficult.  I have 
 a text file with lines like this:
 
 snip
 
 1 (1) DAVENPORT, LINDSAY 3380.00 16 .00 49.00 USA .00
 2 (2) CLIJSTERS, KIM 3206.00 17 .00 .00 BEL .00
 [...]
 28 (28) MOLIK, ALICIA 671.00 15 .00 195.00 AUS .00
 29 (33) MEDINA GARRIGUES, ANABEL 660.75 27 30.00 10.00 ESP 2.00
 30 (35) KOUKALOVA, KLARA 660.75 23 16.00 20.00 CZE 2.00
 
 /snip
 
 that I want to turn into a tab-delimited file.  

Well, you haven't let us know what the output is supposed to look like,
but try this for starters:

while (DATA) {
my @cols =
m/^(\d+)  #id1
  \s(\(\d+\)) #id2
  \s([\w ]+), #lastnames
  \s(\w+) #first name
  \s([\d\.]+) #data1
  \s([\d\.]+) #data2
  \s([\d\.]+) #data3
  \s([\d\.]+) #data4
  \s(\w+) #country code
  \s([\d\.]+) #data5
/x;
printf %s\n, join \t, @cols;
}

__DATA__
1 (1) DAVENPORT, LINDSAY 3380.00 16 .00 49.00 USA .00
2 (2) CLIJSTERS, KIM 3206.00 17 .00 .00 BEL .00
28 (28) MOLIK, ALICIA 671.00 15 .00 195.00 AUS .00
29 (33) MEDINA GARRIGUES, ANABEL 660.75 27 30.00 10.00 ESP 2.00
30 (35) KOUKALOVA, KLARA 660.75 23 16.00 20.00 CZE 2.00


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Yet another regex question

2006-01-12 Thread John Deighan
At 10:19 AM 1/12/2006, 
=?koi8-r?Q?=E1=D2=D4=C5=CD=20=E1=D7=C5=D4=C9=D3=D1=CE?= wrote:

If you want tab instead of space after each country code, try this:

while (FILEFROM) {
  if (/\d\s[A-Z]{3}\s/) {
s/(\d\s[A-Z]{3})\s/$1\t/g;
  }
  print FILETO $_;
}



I don't see the point of the if statement. Why not just do the 
s/.../g as the only statement in the while loop.


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Yet another regex question

2006-01-12 Thread James Sluka




Ted wrote:



while (FILEFROM) {



 chomp;
 if ($_ =~/\d\s[A-Z]{3}\s/) {
 $_ = s/$1/$1\t/g;
 }
 print FILETO $_\n;
}

   


You were close Ted but there are a couple problems.
1.  $_ = s/$1/$1\t/g; should be $_ =~ s/$1/$1\t/g; (you left out the ~)

2.  $1 isn't defined anywhere in this code since there are no paren's in the 
first REGEX.

Try this (similar to what Artemave posted);
while (FILEFROM) {
 chomp;
 $_ =~ s/(\d\s[A-Z]{3}\s)/$1\t/;
 print FILETO $_;
}

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re[2]: Yet another regex question

2006-01-12 Thread Артем Аветисян
True. Wanted to be the first replier ;)

Artem A. Avetisyan

 
 At 10:19 AM 1/12/2006, 
 =?koi8-r?Q?=E1=D2=D4=C5=CD=20=E1=D7=C5=D4=C9=D3=D1=CE?= wrote:
 If you want tab instead of space after each country code, try this:
 
 while (FILEFROM) {
if (/\d\s[A-Z]{3}\s/) {
  s/(\d\s[A-Z]{3})\s/$1\t/g;
}
print FILETO $_;
 }
 
 
 I don't see the point of the if statement. Why not just do the 
 s/.../g as the only statement in the while loop.
 
 ___
 Perl-Win32-Users mailing list
 Perl-Win32-Users@listserv.ActiveState.com
 To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
 

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Yet another regex question

2006-01-12 Thread Ted Schuerzinger
Joe Discenza [EMAIL PROTECTED] graced perl with these words
of wisdom: 

 You seem to have a pretty good picture of your data; why not turn
 that into a regex completely, instead of doing it piecemeal? 
  
 /(\d+)\s+\((\d+)\)\s+([A-Z\s]+),\s+([A-Z]+)\s+(\S+)\s+(\S+)\s+(\S+)\s+
 ([A-Z]{3})\s+(\S+)/ 
  
 and have a replace section that strings together all your captures
 with tabs between: 
  
 s/.../$1\t$2\t$3\t$4\t$5\t$6\t$7\t$8\t$9\t${10}/
 
I'd like to thank everybody who came up with suggestions.  One thing I 
forgot to point out is that there are also people with whitespace in their 
*given* names, which seems to make things even more problematic 
(backtracking and all that).

Somebody off-list gave me the suggestion not of using a regex, but of 
splitting each line on the \s characters, and then manipulating arrays 
with pop and shift and reverse.  That's something that I'm decidedly more 
able to handle, although I have a question -- when I used this bit of 
code:

snip

$transfer[2] = scalar @line; # @line only has the names left
   for $x (0 .. 8) {
   print FILETO $transfer[$x]\t;
   }
   print FILETO \n
}

The results were as follows:

snip

1   (1) 2   3380.00 16  .00 49.00   USA .00
2   (2) 2   3206.00 17  .00 .00 BEL .00 
3   (3) 2   2851.00 19  .00 .00 FRA 1.00

[...]

/snip

I had to amend the code to add an if/else clause for the $x=2 case:

$transfer[2] = scalar @line;
   for $x (0 .. 8) {
   if ($x == 2) {
  print FILETO @line\t;
  }
   else {
  print FILETO $transfer[$x]\t;
   }
   }
   print FILETO \n
}

In order to get it to work.  Any ideas why?

-- 
Ted fedya at bestweb dot net
Oh Marge, anyone can miss Canada, all tucked away down there
--Homer Simpson

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Yet another regex question

2006-01-12 Thread Chris Wagner
At 04:43 PM 1/12/2006 -0500, Ted Schuerzinger wrote:
$transfer[2] = scalar @line; # @line only has the names left
   for $x (0 .. 8) {
   print FILETO $transfer[$x]\t;
   }
   print FILETO \n
}

If @lines contains the name components then u need to do join  , @lines to
get the contents out.  scalar @lines returns the number of elements.





--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--
...ne cede malis

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex Newbie Q: Non-Trivial Substitution and Modifying the Matched String

2005-10-10 Thread Kenneth McNamara


Veli-Pekka

You're implying that the music macro language is pos() sensitive.   
That is a pretty severe problem in itself.


Can you output the macro language in a format that is not position  
sensitive - do a global change - then process the macro language back  
into original format?


I think I'd be more inclined to use a $noteindex{$note} = $pitch  
hash table than a regex - then process the input one note at a time.


KenMc


On Oct 9, 2005, at 2:58 PM, Veli-Pekka Tätilä wrote:


Hi,
Yet another newbie question about regular expressions:
I'd like to find and replace bits of text as usual. However, rather  
than replace all occurrences in one quick swoop using the s- 
operator and the g-flag, the replacement is so complex that it  
cannot be expressed as a straight substitution. So I would have to  
find a piece of text, process it in a separate function, and  
replace the matched text with the newly computed text. This goes on  
for n interesting matches in the input.


Can I do this kind of thing in a simple loop, processing all  
matches one by one? My understanding is that pos and some special  
variables will tel me the character index of the mach in a string.  
But if I then go and modify the string  using substr to do the  
substitution, wil it reset the search position to the beginning  
when trying to match the next interesting bit? The replacement text  
is by nature longer than the original so the input string needs to  
grow on each substitution which might present a problem to the  
matching operator. I took a look at perlop and some books I have on  
Perl but didn't end up with a definitive answer of how I should  
solve this problem.


That was the problem abstractly put, here's the specific instance:
I'm writing a program to convert notes given in the music macro  
language to their equivalent pitches that are applied using markup  
tags for speech synthesizers. I can match a note easily, and have  
functions for computing the pitch and the tag in question. As the  
note values don't depend on each other in any way, I'd like to  
completely process one note at a time, doing the replacement, and  
then continue matching the next note where it previously left off.  
Naturally there are other tokens than just notes in the input so I  
need to maintain the position of the note data in the input string.  
If I modify a separate copy of the string, it will throw off the pos 
() indeces because the substitutions will change the length of the  
copy.


Any help appreciated as usual.

PS: If the problem statement is still a bit fuzzy or incomplete,  
just ask and I'll try to provide more info.


--
With kind regards Veli-Pekka Tätilä ([EMAIL PROTECTED])
Accessibility, game music, synthesizers and programming:
http://www.student.oulu.fi/~vtatila/
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs




___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Regex Newbie Q: Non-Trivial Substitution and Modifying the MatchedString

2005-10-10 Thread Joe Discenza
Title: Regex Newbie Q: Non-Trivial Substitution and Modifying the MatchedString






"Veli-Pekka Tätilä" wrote, on Sun 10/9/2005 15:58: Yet another newbie question about regular 
expressions:: I'd like to find and replace bits of text as usual. However, 
rather than: replace all occurrences in one quick swoop using the s-operator 
and the: g-flag, the replacement is so complex that it cannot be expressed 
as a: straight substitution. So I would have to find a piece of text, 
process it: in a separate function, and replace the matched text with the 
newly computed: text. This goes on for n interesting matches in the 
input.
I can't tell from 
your note if you've investigated the /e flag yet, that allows you to replace a 
chunk of text with the result of a function call:

s/(stuff that's not a 
note)?(note)/$1tag_from_note($2)/ge;

Good 
luck,

Joe


== 
Joseph P. Discenza, Sr. 
Programmer/Analyst 
mailto:[EMAIL PROTECTED] 
Carleton Inc. http://www.carletoninc.com 
574.243.6040 ext. 300 fax: 574.243.6060Providing 
Financial Solutions and Compliance for over 30 Years



___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex Newbie Q: Non-Trivial Substitution and Modifying the Matched String

2005-10-10 Thread Veli-Pekka Tätilä

Kenneth McNamara wrote:

You're implying that the music macro language is pos() sensitive.
That is a pretty severe problem in itself.
Hmm, I'm not totally sure if it is. But true certain modifiers apply until 
the next note, such as one or more periods. My MML experience is actually 
from trying to write some tunes for the 8-bit Nintendo to see how that's 
like compared to ordinary MIDI and audio sequencing.


Unfortunately, most of the MML docs I've seen are in Japanese and I only 
have some English tutorials related to the Nintendo.


This is not that big a hurdle in this project, though. I only took the 
inspiration for ASCII note data from MML, it's not going to be a 
fully-fledged MML parser or anything. I'm hehlping out a fellow musician to 
do singing synthesis using ordinary speech synths not originally ment foor 
that purpose. Tiny Perl using WIn32::FileOp and Win32::GuiTest seems to be 
great for this purpose. As a screen reader user I know nearly all Windows 
keyboard hotkeys by heart, so programmatically interacting with most GUI 
controls is a breeze,


I think I'd be more inclined to use a $noteindex{$note} = $pitch hash 
table
I'm using one hash that maps from note names including sharps to their 
pitches that are pre-computed at program startup. Then I just need to 
multiply the pitch to do an octave shift.


If I were to parse the whole of the MML language I bet even regexp could not 
be used to simply match the whole thing. I'm currently going through a 
tutorial on recursive, descent parsers and writing the stuff in, you guessed 
it Perl, rather than Pascal. But this is getting OT, at least as far as my 
original problem goes, which was cleanly solved by at least two people 
already, nice.


--
With kind regards Veli-Pekka Tätilä ([EMAIL PROTECTED])
Accessibility, game music, synthesizers and programming:
http://www.student.oulu.fi/~vtatila/ 


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex Newbie Q: Non-Trivial Substitution and Modifying the MatchedString

2005-10-10 Thread Veli-Pekka Tätilä

Joe Discenza wrote:

Veli-Pekka Tätilä wrote, on Sun 10/9/2005 15:58

replacement is so complex that it cannot be
expressed as a straight substitution. So I would have to find a
piece of text, process it in a separate function, and replace the
matched text with the newly computed text. This goes on for n
interesting matches in the input.

I can't tell from your note if you've investigated the /e flag yet,
that allows you to replace a chunk of text with the result of a
function call:
s/(stuff that's not a note)?(note)/$1tag_from_note($2)/ge;

Hi Joe,
The e-flag was just what I was looking for, thanks. Being Perl, I guessed 
there would be some easy way of achieving the desired effect. The perlop 
page is not that hierarchically structured so I missed the e-flag there. 
Partly because there are only a couple of lines about it but that's what you 
get in a reference manual, grin.


--
With kind regards Veli-Pekka Tätilä ([EMAIL PROTECTED])
Accessibility, game music, synthesizers and programming:
http://www.student.oulu.fi/~vtatila/ 


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex Newbie Q: Non-Trivial Substitution and Modifying the Matched String

2005-10-10 Thread Andy_Bach
Not sure I'm getting it completely, but using match in a while loop w/ the 
/g modifier lets you process a string one match at a time:
my $string = Lots of words to be read one at a time.\nthough more than 
one line;
while ( $string =~ /(\w+)/g ) {
   print Found: $1\n;
   print Proceessing ...\n;
}   # while /(\w+)/g

While the original string should be left alone (so pos and \G (the marker 
for the last match) don't get confused) you can process/chop up a copy of 
the string as you go. To get really tricky, you can munge the string and 
matches via assignments to pos() - best to look at Mastering Regular 
Expressions (O'Reilly/Freidl - or better, buy it!) Chapter 7ish. That's 
not for the fainthearted.

@nums = $data =~ m/\d+/g; # pick apart string, returning a list of numbers

Suppose the list of numbers starts *after* a marker - xx - prime the 
'pos'

$data =~ m/xx/g; # prime the /g start, pos($data) now point to just 
after the xx
@nums = $data =~ m/\d+/g; # pick apart the rest of the string, returning a 
list of numbers

Or:
pos($data) = $i if $i = index($data, xx), $i  0;   # find the xx
@nums = $data =~ m/\d+/g; # pick apart the rest of the string, returning a 
list of numbers

difference here is pos is just before the xx while in the previous, 
its just after but ...

a

Andy Bach, Sys. Mangler
Internet: [EMAIL PROTECTED] 
VOICE: (608) 261-5738  FAX 264-5932

 History will have to record that the greatest tragedy of this period of
social transition was not the strident clamor of the bad people, but the
appalling silence of the good people. 
Martin Luther King, Jr.
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Regex Newbie Q: Non-Trivial Substitution and Modifying the Matched String

2005-10-09 Thread Veli-Pekka Tätilä

Hi,
Yet another newbie question about regular expressions:
I'd like to find and replace bits of text as usual. However, rather than 
replace all occurrences in one quick swoop using the s-operator and the 
g-flag, the replacement is so complex that it cannot be expressed as a 
straight substitution. So I would have to find a piece of text, process it 
in a separate function, and replace the matched text with the newly computed 
text. This goes on for n interesting matches in the input.


Can I do this kind of thing in a simple loop, processing all matches one by 
one? My understanding is that pos and some special variables will tel me the 
character index of the mach in a string. But if I then go and modify the 
string  using substr to do the substitution, wil it reset the search 
position to the beginning when trying to match the next interesting bit? The 
replacement text is by nature longer than the original so the input string 
needs to grow on each substitution which might present a problem to the 
matching operator. I took a look at perlop and some books I have on Perl but 
didn't end up with a definitive answer of how I should solve this problem.


That was the problem abstractly put, here's the specific instance:
I'm writing a program to convert notes given in the music macro language to 
their equivalent pitches that are applied using markup tags for speech 
synthesizers. I can match a note easily, and have functions for computing 
the pitch and the tag in question. As the note values don't depend on each 
other in any way, I'd like to completely process one note at a time, doing 
the replacement, and then continue matching the next note where it 
previously left off. Naturally there are other tokens than just notes in the 
input so I need to maintain the position of the note data in the input 
string. If I modify a separate copy of the string, it will throw off the 
pos() indeces because the substitutions will change the length of the copy.


Any help appreciated as usual.

PS: If the problem statement is still a bit fuzzy or incomplete, just ask 
and I'll try to provide more info.


--
With kind regards Veli-Pekka Tätilä ([EMAIL PROTECTED])
Accessibility, game music, synthesizers and programming:
http://www.student.oulu.fi/~vtatila/ 


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex Newbie Q: Non-Trivial Substitution and Modifying the MatchedString

2005-10-09 Thread Chris Wagner
So what ur saying is that u want to do a lot of substitutions in one pass.
U could always have one s///g for each thing u want substituted and run it n
times.  Can u post some actual strings that u want to parse and the
substitutions?  Then we can figure it out.

At 10:58 PM 10/9/05 +0300, =?iso-8859-1?Q?Veli-Pekka_T=E4til=E4?= wrote:
That was the problem abstractly put, here's the specific instance:
I'm writing a program to convert notes given in the music macro language to 
their equivalent pitches that are applied using markup tags for speech 
synthesizers. I can match a note easily, and have functions for computing 
the pitch and the tag in question. As the note values don't depend on each 
other in any way, I'd like to completely process one note at a time, doing 
the replacement, and then continue matching the next note where it 
previously left off. Naturally there are other tokens than just notes in the 
input so I need to maintain the position of the note data in the input 
string. If I modify a separate copy of the string, it will throw off the 
pos() indeces because the substitutions will change the length of the copy.





--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--
...ne cede malis

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex

2005-09-28 Thread $Bill Luebkert
Chris Wagner wrote:

 At 05:11 PM 9/27/05 -0700, $Bill Luebkert wrote:
 
\s* means to grab any WS at the current position (including the case where
   there is none).

\s*? means 0 or 1 of the above which is totally meaningless - you've already
   eaten all the WS with the \s*, so in my opinion the ? is redundant to
   what you have already done.

I retract the above \s*? stmt.  \s*? won't grab any WS in this case because
(\s*?) is not the same as (\s*)? (which is what I was thinking).

 Redundant vs. Useless!  Semantic battle of the century!!  Who's right and
 who's wrong: and will be put to DEATH!
 
 Redundant:
 m/\s*\s/;   # Specifying something again when it was already specified
 
 Useless:
 m/xyz\s*?$/;# Specifying something that does nothing

 * maximal match, eat up as many characters as possible to make the overall
 expression match
 *? minimal match, eat up as few characters as possible to make the overall
 expression match

So in my revised opinion, it's not redundant, but it's not useless either -
it's plain wrong.  I'm sure the intent here is to eat as much WS as there is
and that's not what \s*? will do for you.  \s*? won't eat any WS.

 That rule doesn't disappear just because a certain character sequence was
 specified.  *? is only *useful* when used with wildcards since it will decay
 to a nul if used with a fixed string.  The minimum of the range 0 to inf is 0.

-- 
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--  o // //  Castle of Medieval Myth  Magic http://www.todbe.com/
-/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex

2005-09-27 Thread robert johnson



David Budd wrote:
I thought this was working, but my logs just showed a case where it 
seems not to do what I want.

Why does:
$OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ;
Not become true when $body contains:
Library Card: 0240742



i'd bet that it passes when you have some whitespace after the number
and fails when it doesn't.  thats a common problem and can be hard to trace.

in any event, your ending \D *requires* a non-digit character to
immediately follow the number.  since your example failed, it must be
terminating the line.  adding a ? or * to the \D will not behave as (I
believe) you intend, i.e. it will not filter out numbers with 8 or more
digits

so IF your string always terminates the line (with or without
whitespace), this will require exactly seven digits and not care whether
or not whitespace follows:

 $OK_body = ($body =~ /library\s*card\s*:?\s*(\d{7})\s*$/i)

However...  if the library card: # string does not always terminate
the line and/or if you need to allow the possibility of non-digit
characters immediately after the 7-digit number (example, library card:
1234567MORE_STUFF), then you will need to use a word boundary:

 $OK_body = ($body =~ /library\s*card\s*:?\s*(\d{7})\D*\b/i)

and by the way,  *? is redundant.
* means zero or more.
? means zero or one.

--rob

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex

2005-09-27 Thread Chris Wagner
At 12:08 AM 9/27/05 -0700, robert johnson wrote:
and by the way,  *? is redundant.
* means zero or more.
? means zero or one.


Actually the *? construct is not a redundancy.  It calls for a minimal match
rather than a maximal match, which is the default.  Although it was useless
in the example. ;)




--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--
...ne cede malis

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex

2005-09-27 Thread $Bill Luebkert
Chris Wagner wrote:

 At 12:08 AM 9/27/05 -0700, robert johnson wrote:
 
and by the way,  *? is redundant.
* means zero or more.
? means zero or one.
 
 
 
 Actually the *? construct is not a redundancy.  It calls for a minimal match
 rather than a maximal match, which is the default.  Although it was useless
 in the example. ;)

Maybe wrong would be a better term for you ?

\s* means to grab any WS at the current position (including the case where
there is none).

\s*? means 0 or 1 of the above which is totally meaningless - you've already
eaten all the WS with the \s*, so in my opinion the ? is redundant to
what you have already done.

-- 
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--  o // //  Castle of Medieval Myth  Magic http://www.todbe.com/
-/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex

2005-09-25 Thread $Bill Luebkert
John wrote:

 David Budd wrote:
 
I thought this was working, but my logs just showed a case where it seems not 
to do what I want.
Why does:
$OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ;
Not become true when $body contains:
Library Card: 0240742

Just possibly there's some dodgy html or something in the original that 
doen't make it through to my logs, but right now I'm perplexed
 
 
 Having looked at other replies to this, isn't it irrelevant what comes 
 after the (\d{7}) part of the re.

Depends - I'd put a \b after it if you want to make sure there are no more
characters in the field/number.

-- 
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--  o // //  Castle of Medieval Myth  Magic http://www.todbe.com/
-/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex

2005-09-25 Thread Chris Wagner
At 02:42 PM 9/25/05 +1000, [EMAIL PROTECTED] wrote:
 Why does:
 $OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ;
 Not become true when $body contains:
 Library Card: 0240742
Having looked at other replies to this, isn't it irrelevant what comes 
after the (\d{7}) part of the re.

It's not true because the final \D requires that something be present
*after* 7 digits are found.






--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--
...ne cede malis

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex

2005-09-24 Thread John

David Budd wrote:

I thought this was working, but my logs just showed a case where it seems not 
to do what I want.
Why does:
$OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ;
Not become true when $body contains:
Library Card: 0240742

Just possibly there's some dodgy html or something in the original that doen't 
make it through to my logs, but right now I'm perplexed


Having looked at other replies to this, isn't it irrelevant what comes 
after the (\d{7}) part of the re.


--
Regards
   John McMahon  ([EMAIL PROTECTED])




--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.11.6/111 - Release Date: 23/09/2005

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Regex

2005-09-21 Thread David Budd

 
   You state that there must be a NON numeric at end of 
 line. I would have \D* or \D*$.

Excellent suggestion. I shall implement it forthwith.

Part of the reason I was perplexed was that this script ran for a year with 
nobody complaining.
In fact, I discovered some time after I'd posted that it's not the regex that's 
failing, it's some execrable code doing the logging - I hadn't cleared a 
variable at the beginning of a loop, so the log messages sometimes referred to 
the previous iteration. I guess we were just lucky over the last year that on 
the odd occasions we had to sort problems by checking the logs, it didn't 
affect anything.


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Regex

2005-09-21 Thread Chris Wagner
At 09:07 AM 9/21/05 +0100, [EMAIL PROTECTED] wrote:
  You state that there must be a NON numeric at end of 
 line. I would have \D* or \D*$.

Excellent suggestion. I shall implement it forthwith.

Actually, to make sure ur string ends with a non-numeric u need \D$ not
\D*$.  \D* matches 0 or more non-digits.  That is used for cases where a
string could end in \D but doesn't *have* to.

C:\WINDOWS\Desktopperl
$bob = abc123;
print non-number end\n if $bob =~ m/.+\D*$/;
***
non-number end






--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--
...ne cede malis

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Regex

2005-09-20 Thread David Budd
I thought this was working, but my logs just showed a case where it seems not 
to do what I want.
Why does:
$OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ;
Not become true when $body contains:
Library Card: 0240742

Just possibly there's some dodgy html or something in the original that doen't 
make it through to my logs, but right now I'm perplexed
-- 
David Budd, Applications section, IT Services
Kilburn Building, University of Manchester
Tel 56033 Email [EMAIL PROTECTED] 
http://www.its.man.ac.uk/applications 


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Regex

2005-09-20 Thread Joe Discenza
Title: Regex






David Budd wrote, on Tue 9/20/2005 10:57:

: I thought this was working, but my logs just showed a case 
where it seems not to do what I want.: Why does:: 
$OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ;: Not become true when 
$body contains:: Library Card: 0240742
Probably because your regex explicitly requires a non-digit (\D) 
at the end, and your example line doesn't have it. Perhaps you want \D*$ so it 
finds no more than 7 digits, or if more digits are allowed as long as a 
non-digit intervenes, you might want (\D|$).
Good luck,
Joe
== 
Joseph P. Discenza, Sr. 
Programmer/Analyst 
mailto:[EMAIL PROTECTED] 
Carleton Inc. http://www.carletoninc.com 
574.243.6040 ext. 300 fax: 574.243.6060Providing 
Financial Solutions and Compliance for over 30 
Years



___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex

2005-09-20 Thread Siebe Tolsma
The string $body (Library Card: 0240740) does not end in \D (not a 
number). You might want to add a * to that if you want to make sure it 
matches strings that _do_ end in \D (ie. \n or \r\n or whatever stuff comes 
behind the ID) and those that end in the ID itself.


- Original Message - 
From: David Budd [EMAIL PROTECTED]

To: perl-win32-users@listserv.ActiveState.com
Sent: Tuesday, September 20, 2005 4:57 PM
Subject: Regex


I thought this was working, but my logs just showed a case where it seems 
not to do what I want.

Why does:
$OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ;
Not become true when $body contains:
Library Card: 0240742

Just possibly there's some dodgy html or something in the original that 
doen't make it through to my logs, but right now I'm perplexed

--
David Budd, Applications section, IT Services
Kilburn Building, University of Manchester
Tel 56033 Email [EMAIL PROTECTED]
http://www.its.man.ac.uk/applications


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs 


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Regex

2005-09-20 Thread $Bill Luebkert
David Budd wrote:

 I thought this was working, but my logs just showed a case where it seems not 
 to do what I want.
 Why does:
 $OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ;

What's the last \D for ?  '\s*?' should just be '\s*' - same with \D*?.

 Not become true when $body contains:
 Library Card: 0240742
 
 Just possibly there's some dodgy html or something in the original that 
 doen't make it through to my logs, but right now I'm perplexed

use strict;
use warnings;

my $body = 'Library Card: 0240742';
my $OK_body = ($body =~ /library\s*card\D*(\d{7})/i);
print OK_body = , $OK_body ? 'true' : 'false', \n;

__END__

-- 
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--  o // //  Castle of Medieval Myth  Magic http://www.todbe.com/
-/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Regex

2005-09-20 Thread Wagner, David --- Senior Programmer Analyst --- WGO
[EMAIL PROTECTED] wrote:
 I thought this was working, but my logs just showed a case where it
 seems not to do what I want. 
 Why does:
 $OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ;
 Not become true when $body contains:
 Library Card: 0240742
 
 Just possibly there's some dodgy html or something in the original
 that doen't make it through to my logs, but right now I'm perplexed 

You state that there must be a NON numeric at end of line. I would have 
\D* or \D*$.
Wags ;)


***
This message contains information that is confidential
and proprietary to FedEx Freight or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.
***


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Regex

2005-09-20 Thread Matthew Ryan
David wrote:

I thought this was working, but my logs just showed a case where it seems
not to do what I want.
Why does:
$OK_body=($body=~/library\s*?card\D*?(\d{7})\D/i) ;
Not become true when $body contains:
Library Card: 0240742

Just possibly there's some dodgy html or something in the original that
doen't make it through to my logs, but right now I'm perplexed

Is there a chance that 'Library Card: 0240742' comes at the end of the file?
Or that you used chomp on the variable $body?
The \D at the end of the regular expression would not work if there wasn't a
nondigit at the end of the line.



___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: regex expression to determine if i have a valid email!!

2005-09-16 Thread $Bill Luebkert
bruce wrote:
 hi...
 
 i've got a php app, and i'm trying to figure out how/where to turn to to get
 a good working regex in order to determine if i have a valid email address
 
 any help/thoughts/etc.. would be seriously helpful...
 
 i've come across a great many preg_match functions for php, but i haven't
 run across one that works all the time!!
 
 i finally figured that someone here, might have the exact soln to my prob!

Try these links for help:

http://search.cpan.org/src/MAURICE/Email-Valid-0.15/Valid.pm
http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz

-- 
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--  o // //  Castle of Medieval Myth  Magic http://www.todbe.com/
-/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: regex expression to determine if i have a valid email!!

2005-09-16 Thread $Bill Luebkert
Chris Wagner wrote:

 Well it very much depends on what u consider a valid email address.
 Because technically, anything can be valid in some context.  What u probably
 want here is a fully qualified Internet mail address.  The basic form of
 this would be m/[EMAIL PROTECTED]/.

The protocol expects a path like this to the mail server:

path ::=  [ a-d-l : ] mailbox 

The mailbox portion is what you would normally see at the app.
See mailbox below:

a-d-l ::= at-domain | at-domain , a-d-l

at-domain ::= @ domain

domain ::=  element | element . domain

element ::= name | # number | [ dotnum ]

mailbox ::= local-part @ domain

local-part ::= dot-string | quoted-string

name ::= a ldh-str let-dig

ldh-str ::= let-dig-hyp | let-dig-hyp ldh-str

let-dig ::= a | d

let-dig-hyp ::= a | d | -

dot-string ::= string | string . dot-string

string ::= char | char string

quoted-string ::=   qtext 

qtext ::=  \ x | \ x qtext | q | q qtext

char ::= c | \ x

dotnum ::= snum . snum . snum . snum

number ::= d | d number

CRLF ::= CR LF

CR ::= the carriage return character (ASCII code 13)

LF ::= the line feed character (ASCII code 10)

SP ::= the space character (ASCII code 32)

snum ::= one, two, or three digits representing a decimal
  integer value in the range 0 through 255

a ::= any one of the 52 alphabetic characters A through Z
  in upper case and a through z in lower case

c ::= any one of the 128 ASCII characters, but not any
  special or SP

d ::= any one of the ten digits 0 through 9

q ::= any one of the 128 ASCII characters except CR,
  LF, quote (), or backslash (\)

x ::= any one of the 128 ASCII characters (no exceptions)

special ::=  |  | ( | ) | [ | ] | \ | .
  | , | ; | : | @   | the control
  characters (ASCII codes 0 through 31 inclusive and
  127)

  If u want to limit that to known
 legitimate MX's u can do DNS lookups on the domain part.


-- 
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--  o // //  Castle of Medieval Myth  Magic http://www.todbe.com/
-/-' /___/__/_/_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: regex expression to determine if i have a valid email!!

2005-09-16 Thread Thomas, Mark - BLS CTR
 i've got a php app

Say it isn't so! :)

 i finally figured that someone here, might have the exact 
 soln to my prob!

1. Make sure it is formatted correctly, as per RFC 822. To do that, there is
one big scary regex created by Jeffrey Friedl (author of Mastering Regular
Expressions). If you check the link that $Bill gave you to Email::Valid, the
regular expression is in there.

2. Make sure the domain is valid, and resolves to an IP address

3. Make sure an MX record is defined for the domain, and the mail host is
accessible.

4. (Optional) Check the ip against selected blacklist(s).

Of course, the easiest way to do this is Perl. Email::Valid does 1-3 for
you.

- Mark.

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: regex expression to determine if i have a valid email!!

2005-09-16 Thread pDale
On 9/16/05, Chris Wagner [EMAIL PROTECTED] wrote:
Well it very much depends on what u consider a valid email address.Because technically, anything can be valid in some context.What u probablywant here is a fully qualified Internet mail address.The basic form of
this would be m/[EMAIL PROTECTED]/.If u want to limit that to knownlegitimate MX's u can do DNS lookups on the domain part.At 08:23 PM 9/15/05 -0700, [EMAIL PROTECTED]
 wrote:hi...i've got a php app, and i'm trying to figure out how/where to turn to to geta good working regex in order to determine if i have a valid email addressany help/thoughts/etc.. would be seriously helpful...
i've come across a great many preg_match functions for php, but i haven'trun across one that works all the time!!
Here's an article that discusses true validation: http://coveryourasp.com/ValidateEmail.asp

-- 
pDale
Quando Omni Flunkus Moritati.
 (When all else fails, play dead.)
 -- Red Green
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: regex expression to determine if i have a valid email!!

2005-09-16 Thread bruce
so mark...

what you're saying is that i should write a simple perl app that does the
email validation, and call it from the php app, passing it the email address
to check

-bruce



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of
Thomas, Mark - BLS CTR
Sent: Friday, September 16, 2005 5:49 AM
To: '[EMAIL PROTECTED]'; 'perl-win32-users'
Subject: RE: regex expression to determine if i have a valid email!!


 i've got a php app

Say it isn't so! :)

 i finally figured that someone here, might have the exact
 soln to my prob!

1. Make sure it is formatted correctly, as per RFC 822. To do that, there is
one big scary regex created by Jeffrey Friedl (author of Mastering Regular
Expressions). If you check the link that $Bill gave you to Email::Valid, the
regular expression is in there.

2. Make sure the domain is valid, and resolves to an IP address

3. Make sure an MX record is defined for the domain, and the mail host is
accessible.

4. (Optional) Check the ip against selected blacklist(s).

Of course, the easiest way to do this is Perl. Email::Valid does 1-3 for
you.

- Mark.

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: regex expression to determine if i have a valid email!!

2005-09-16 Thread Thomas, Mark - BLS CTR
Sure. For extra coolness, you could do this via Ajax so that it is validated
on the fly while the user is still filling out the form. 

- Mark.

 -Original Message-
 From: bruce [mailto:[EMAIL PROTECTED] 
 Sent: Friday, September 16, 2005 11:21 AM
 To: Thomas, Mark - BLS CTR; 'perl-win32-users'
 Subject: RE: regex expression to determine if i have a valid email!!
 
 so mark...
 
 what you're saying is that i should write a simple perl app 
 that does the
 email validation, and call it from the php app, passing it 
 the email address
 to check
 
 -bruce
 
 
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of
 Thomas, Mark - BLS CTR
 Sent: Friday, September 16, 2005 5:49 AM
 To: '[EMAIL PROTECTED]'; 'perl-win32-users'
 Subject: RE: regex expression to determine if i have a valid email!!
 
 
  i've got a php app
 
 Say it isn't so! :)
 
  i finally figured that someone here, might have the exact
  soln to my prob!
 
 1. Make sure it is formatted correctly, as per RFC 822. To do 
 that, there is
 one big scary regex created by Jeffrey Friedl (author of 
 Mastering Regular
 Expressions). If you check the link that $Bill gave you to 
 Email::Valid, the
 regular expression is in there.
 
 2. Make sure the domain is valid, and resolves to an IP address
 
 3. Make sure an MX record is defined for the domain, and the 
 mail host is
 accessible.
 
 4. (Optional) Check the ip against selected blacklist(s).
 
 Of course, the easiest way to do this is Perl. Email::Valid 
 does 1-3 for
 you.
 
 - Mark.
 
 ___
 Perl-Win32-Users mailing list
 Perl-Win32-Users@listserv.ActiveState.com
 To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
 
 

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: regex expression to determine if i have a valid email!!

2005-09-16 Thread Ahlsen-Girard Edward F Contr MPSG
Title: RE: regex expression to determine if i have a valid email!!





There is also the address checking _expression_ in the very nice Mail-Sendmail module.


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of [EMAIL PROTECTED]

Sent: Friday, September 16, 2005 14:05
To: perl-win32-users@listserv.ActiveState.com
Subject: Perl-Win32-Users Digest, Vol 20, Issue 14


Send Perl-Win32-Users mailing list submissions to
 perl-win32-users@listserv.ActiveState.com


To subscribe or unsubscribe via the World Wide Web, visit
 http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users
or, via email, send a message with subject or body 'help' to
 [EMAIL PROTECTED]


You can reach the person managing the list at
 [EMAIL PROTECTED]


When replying, please edit your Subject line so it is more specific
than Re: Contents of Perl-Win32-Users digest...



___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


regex expression to determine if i have a valid email!!

2005-09-15 Thread bruce
hi...

i've got a php app, and i'm trying to figure out how/where to turn to to get
a good working regex in order to determine if i have a valid email address

any help/thoughts/etc.. would be seriously helpful...

i've come across a great many preg_match functions for php, but i haven't
run across one that works all the time!!

i finally figured that someone here, might have the exact soln to my prob!


thanks

-bruce
[EMAIL PROTECTED]


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: regex expression to determine if i have a valid email!!

2005-09-15 Thread Charles K. Clarkson
bruce  wrote:

: i've got a php app, and i'm trying to figure out how/where to
: turn to to get a good working regex in order to determine if i
: have a valid email address 
: 
: any help/thoughts/etc.. would be seriously helpful...
: 
: i've come across a great many preg_match functions for php, but
: i haven't run across one that works all the time!!

I think this works if you remove comments from the address
first. Email will probably screw it up. Search google for it.

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[
\t]))*(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[
\t]))*(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*))*|(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[
\t]))*(?:(?:\r\n)?[ \t])*)*\(?:(?:\r\n)?[ \t])*(?:@(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[
\t]))*(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[
\t]))*(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*))*\(?:(?:\r\n)?[ \t])*)|(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[
\t]))*(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[
\t]))*(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[
\t]))*(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*))*|(?:[^()@,;:\\.\[\] \000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[
\t]))*(?:(?:\r\n)?[ \t])*)*\(?:(?:\r\n)?[ \t])*(?:@(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[
\t]))*(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[
\t]))*(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*))*\(?:(?:\r\n)?[ \t])*)(?:,\s*(?:(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[
\t]))*(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|(?:[^\\r\\]|\\.|(?:(?:\r\n)?[
\t]))*(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()@,;:\\.\[\]
\000-\031]+(?:(?:(?:\r\n)?[
\t])+|\Z|(?=[\[()@,;:\\.\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
\t])*)(?:\.(?:(?:\r

Re: regex expression to determine if i have a valid email!!

2005-09-15 Thread Chris Wagner
Well it very much depends on what u consider a valid email address.
Because technically, anything can be valid in some context.  What u probably
want here is a fully qualified Internet mail address.  The basic form of
this would be m/[EMAIL PROTECTED]/.  If u want to limit that to known
legitimate MX's u can do DNS lookups on the domain part.

At 08:23 PM 9/15/05 -0700, [EMAIL PROTECTED] wrote:
hi...

i've got a php app, and i'm trying to figure out how/where to turn to to get
a good working regex in order to determine if i have a valid email address

any help/thoughts/etc.. would be seriously helpful...

i've come across a great many preg_match functions for php, but i haven't
run across one that works all the time!!

i finally figured that someone here, might have the exact soln to my prob!







--
REMEMBER THE WORLD TRADE CENTER ---= WTC 911 =--
...ne cede malis

0100

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


  1   2   3   4   >