Re: Complex Regex.

Paul Tue, 24 Apr 2001 10:04:53 -0700

--- [EMAIL PROTECTED] wrote:
> I thought I was improving at expressions but this one has me stumped:
> 
> I have text interspersed with numbers.  The text can be anything,
> including all types of punctuation marks.
> 
> Well let me give an example:  
> 
> The Text has numbers in it apparently-1.0 at random but 12 actually
> not...
> $% ---- 12.3   0.9  .333  33 and -33.9000009  I need to extract
> u&t(y2335.09u see
> 
> all the legiti5.33mate integers and floating point numbers and
> separate them
> with CRs.  All text and nun-numeric data should be chucked.  it needs
> to
> look for negative numbers but 2-1 and 4-66.7  are "2  1  4  66.7".  
> That is
> - is a delimiter unless there's a space in front in which case the
> number is
> negative.
> 
> The output for the text above should be:
> 
> -1.0  
> 12
> 12.3
> 0.9
> .333
> 33
> -33.9000009  
> 2335.09
> 5.33
> 2
> 1
> 4
> 66.7
> 2
> 1
> 2
> 66.7
> 
> Any help on the above would be GREATLY appreciated.  

Okay, I got this to work. 
here's the output:

The text:
 
The Text has numbers in it apparently-1.0 at random but 12 actually
not...
$% ---- 12.3   0.9  .333  33 and -33.9000009  I need to extract
u&t(y2335.09u see
all the legiti5.33mate integers and floating point numbers and separate

them
with CRs.  All text and nun-numeric data should be chucked.  it needs
to
look for negative numbers but 2-1 and 4-66.7  are "2  1  4  66.7".  
That 
is
- is a delimiter unless there's a space in front in which case the
number 
is
negative.
 
The output:
1.0
12
12.3
0.9
.333
33
-33.9000009
2335.09
5.33
2
1
4
66.7
2
1
4
66.7

Here's the code:

 use strict;                      # good habit
 open (FILE,$ARGV[0]) or die $!;  # as a script, processes first arg
 my($data,$result);          # some working vars
 { local $/ = undef;              # set up to ....
   $data = <FILE>;                # slurp the infile into $data
 }                                # close the scope of the local()
 close(FILE);                     # close the arg file
 $result = join "\r\n", $data =~ /((?: -)?\d*[.]?\d+)/sog;
 $result =~ s/^ //omg;            # polish out the leading spaces
 print "The text:\n $data \nThe output:\n$result\n";

and the regex elaborated:

 /((?: -)?\d*[.]?\d+)/sog;

First, I check for negatives, which you said would have a
space-and-then-a-minus. I wrap them with a (?:), which doesn't make
another variable, but lets me group them. That way the return should
still be a neat array, but I can say (?: -)?, which means "zero or one
 ' -'s". The space-minus becomes part of the pattern, but
anything-else-minus is ignored. 

The rest of the pattern is simple enough: any number of digits
(including zero) followed by one-or-no decimals, followed by one or
more digits. That gets (using zero for any digit here) 0.0, .0, and 0,
so we've covered 0.0,  -0.0, .0,  -.0, 0, and -0.

The join stacks them with CRLF's between, accomplishing everything but
erasing the leading space on negatives.

 $result =~ s/^ //omg;            # polish out the leading spaces

So, we treat $result as multiple lines so that ^ matches after every
newline, and knock off the spaces. Done!

Hope that helps, and feel free to ask questions!

Paul

__________________________________________________
Do You Yahoo!?
Yahoo! Auctions - buy the things you want at great prices
http://auctions.yahoo.com/
Re: Complex Regex.

Reply via email to