--- [EMAIL PROTECTED] wrote:
> I thought I was improving at expressions but this one has me stumped:
>
> I have text interspersed with numbers. The text can be anything,
> including all types of punctuation marks.
>
> Well let me give an example:
>
> The Text has numbers in it apparently-1.0 at random but 12 actually
> not...
> $% ---- 12.3 0.9 .333 33 and -33.9000009 I need to extract
> u&t(y2335.09u see
>
> all the legiti5.33mate integers and floating point numbers and
> separate them
> with CRs. All text and nun-numeric data should be chucked. it needs
> to
> look for negative numbers but 2-1 and 4-66.7 are "2 1 4 66.7".
> That is
> - is a delimiter unless there's a space in front in which case the
> number is
> negative.
>
> The output for the text above should be:
>
> -1.0
> 12
> 12.3
> 0.9
> .333
> 33
> -33.9000009
> 2335.09
> 5.33
> 2
> 1
> 4
> 66.7
> 2
> 1
> 2
> 66.7
>
> Any help on the above would be GREATLY appreciated.
Okay, I got this to work.
here's the output:
The text:
The Text has numbers in it apparently-1.0 at random but 12 actually
not...
$% ---- 12.3 0.9 .333 33 and -33.9000009 I need to extract
u&t(y2335.09u see
all the legiti5.33mate integers and floating point numbers and separate
them
with CRs. All text and nun-numeric data should be chucked. it needs
to
look for negative numbers but 2-1 and 4-66.7 are "2 1 4 66.7".
That
is
- is a delimiter unless there's a space in front in which case the
number
is
negative.
The output:
1.0
12
12.3
0.9
.333
33
-33.9000009
2335.09
5.33
2
1
4
66.7
2
1
4
66.7
Here's the code:
use strict; # good habit
open (FILE,$ARGV[0]) or die $!; # as a script, processes first arg
my($data,$result); # some working vars
{ local $/ = undef; # set up to ....
$data = <FILE>; # slurp the infile into $data
} # close the scope of the local()
close(FILE); # close the arg file
$result = join "\r\n", $data =~ /((?: -)?\d*[.]?\d+)/sog;
$result =~ s/^ //omg; # polish out the leading spaces
print "The text:\n $data \nThe output:\n$result\n";
and the regex elaborated:
/((?: -)?\d*[.]?\d+)/sog;
First, I check for negatives, which you said would have a
space-and-then-a-minus. I wrap them with a (?:), which doesn't make
another variable, but lets me group them. That way the return should
still be a neat array, but I can say (?: -)?, which means "zero or one
' -'s". The space-minus becomes part of the pattern, but
anything-else-minus is ignored.
The rest of the pattern is simple enough: any number of digits
(including zero) followed by one-or-no decimals, followed by one or
more digits. That gets (using zero for any digit here) 0.0, .0, and 0,
so we've covered 0.0, -0.0, .0, -.0, 0, and -0.
The join stacks them with CRLF's between, accomplishing everything but
erasing the leading space on negatives.
$result =~ s/^ //omg; # polish out the leading spaces
So, we treat $result as multiple lines so that ^ matches after every
newline, and knock off the spaces. Done!
Hope that helps, and feel free to ask questions!
Paul
__________________________________________________
Do You Yahoo!?
Yahoo! Auctions - buy the things you want at great prices
http://auctions.yahoo.com/