Re: Zero-suppression Regex
Alistair, The input data consists of numeric strings created in two different formats form two different platforms, one having leading signs and one having trailing signs. The commify will ignore the trailing signs, which is fine for my requirements, unless someone wants to propose another regex to swap the signs, i.e. either make all output strings have leading or trailing signs, regardless of input. I changed my mind on the justification, the commify construct alters the length in a manner that could result in a string that is longer than the input, for the program which utilizes this sub, I fould it more pleasant to see all of the strings left-justified, i.e. 1:1-4: 4:N:160 Record ID160 2:5-9: 5:A:160 Type Of Service >COOP < 3: 10-10: 1:A:160 Reading Activity Code >R< 4: 11-25: 15:A:160 Meter Number>17778 < 5: 26-40: 15:A:160 Secondary Meter Number > < 6: 41-42: 2:N:160 Position Number 1 7: 43-47: 5:A:160 Rate Schedule >RES < 8: 48-48: 1:N:160 Meter Type 0 9: 49-51: 3:A:160 Who Read Meter Code >5 < 10: 52-52: 1:N:160 Electrical Use Code 0 11: 53-53: 1:N:160 Number Of Dials 5 12: 54-62: 9:4:160 Multiplier 1. 13: 63-71: 9:N:160 Present Reading 38,512 14: 72-79: 8:D:160 Present Reading Date 02-04-2002 15: 80-80: 1:A:160 Present Reading Code>0< 16: 81-89: 9:N:160 Previous Reading 38,140 17: 90-97: 8:D:160 Previous Reading Date01-09-2002 18: 98-98: 1:A:160 Previous Reading Code >0< 19: 99-107: 9:S:160 KWH Usage372+ 20:108-116: 9:S:160 Revenue 40.14+ 21:117-125: 9:4:160 KVAR Multiplier 0. 22:126-134: 9:2:160 KVAR Previous Reading0.00 23:135-143: 9:2:160 KVAR Present Reading 0.00 24:144-152: 9:S:160 KVAR Usage 0.00+ 25:153-172: 20:A:160 Map Location>162-24-18 < 26:173-174: 2:N:160 Register Number 1 27:175-179: 5:N:160 Reading Number 0 28:180-180: 1:A:160 End Of Record >X< Based upon a suggestion from $Bill on argument handling, the latest revision is: sub ZeroSuppress($;$) { local $_ = $_[0]; # If the argument for the decimal point exists, insert a decimal point # at the specified location. s/(\d{$_[1]})([-+]?)$/\.$1$2/ if ($_[1]); # Zero-suppress the string. s/\b0+(?=\d)//; # Remove leading spaces. s/([-+]?)\s*/$1/; # Insert comma separators. 1 while s/^([-+]?\d+)(\d{3})/$1,$2/; return($_); } Dirk Bremer - Systems Programmer II - ESS/AMS - NISC St. Peters 636-922-9158 ext. 652 fax 636-447-4471 [EMAIL PROTECTED] www.nisc.cc ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Zero-suppression Regex
Carl Jolley wrote: > On Fri, 15 Feb 2002, Dirk Bremer wrote: > > >>$Bill, it was not you who made a mistake with the benchmark, it was I and in the >process learned a lot of new things about the >>Benchmark module, which is a wonderful tool and should be used by anyone who is >interested in performance. This is my revised >>zero-suppression routine: >> >>sub ZeroSuppress($;$) >>{ >>local $_ = $_[0]; >> >># If the argument for the decimal point exists, insert a decimal point >># at the specified location. >>s/(\d{$_[1]})$/\.$1/ if (@_ > 1 and $_[1] > 0); >> >># Zero-suppress the string. >>s/\b0+(?=\d)//; >> >># Remove leading spaces. >>s/([-+]?)\s*/$1/; >> >># Insert comma separators. >>1 while s/^([-+]?\d+)(\d{3})/$1,$2/; >> >>return($_); >>} >> >>If a second numeric argument is supplied, it will specify to where a decimal point >will be inserted into the resulting string, which >>will be left-justified. This routine is very valuable to me for a program that >displays the values of a data file where the >>money-type fields are plain strings. Any suggestions for improvements will be >welcomed. >> >>Dirk Bremer - Systems Programmer II - ESS/AMS - NISC St. Peters >>636-922-9158 ext. 652 fax 636-447-4471 >> >>[EMAIL PROTECTED] >>www.nisc.cc >> >> > > Based on precedence rules, shouldn't: > > s/(\d{$_[1]})$/\.$1/ if (@_ > 1 and $_[1] > 0); > > be more correctly coded as: > > s/(\d{$_[1]})$/\.$1/ if (@_ > 1 && $_[1] > 0); Both '&&' and 'and' are lower prec than >, so either should work fine. > Won't "and $_[1] > 0" be executed even if @_ == 1? > OTOH, won't $_[1] > 0 be false when @_ > 1 is false, > i.e. could not the condition be safely collapsed to: > > if $_[1] > 0; ? or better yet if $_[1]; might be better to handle undefined avoiding the numeric comparison to 0. -- ,-/- __ _ _ $Bill Luebkert ICQ=14439852 (_/ / )// // DBE Collectibles Mailto:[EMAIL PROTECTED] / ) /--< o // // http://dbecoll.tripod.com/ (Free site for Perl) -/-' /___/_<_http://www.todbe.com/ ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Zero-suppression Regex
On Fri, 15 Feb 2002, Dirk Bremer wrote: > $Bill, it was not you who made a mistake with the benchmark, it was I and in the >process learned a lot of new things about the > Benchmark module, which is a wonderful tool and should be used by anyone who is >interested in performance. This is my revised > zero-suppression routine: > > sub ZeroSuppress($;$) > { > local $_ = $_[0]; > > # If the argument for the decimal point exists, insert a decimal point > # at the specified location. > s/(\d{$_[1]})$/\.$1/ if (@_ > 1 and $_[1] > 0); > > # Zero-suppress the string. > s/\b0+(?=\d)//; > > # Remove leading spaces. > s/([-+]?)\s*/$1/; > > # Insert comma separators. > 1 while s/^([-+]?\d+)(\d{3})/$1,$2/; > > return($_); > } > > If a second numeric argument is supplied, it will specify to where a decimal point >will be inserted into the resulting string, which > will be left-justified. This routine is very valuable to me for a program that >displays the values of a data file where the > money-type fields are plain strings. Any suggestions for improvements will be >welcomed. > > Dirk Bremer - Systems Programmer II - ESS/AMS - NISC St. Peters > 636-922-9158 ext. 652 fax 636-447-4471 > > [EMAIL PROTECTED] > www.nisc.cc > Based on precedence rules, shouldn't: s/(\d{$_[1]})$/\.$1/ if (@_ > 1 and $_[1] > 0); be more correctly coded as: s/(\d{$_[1]})$/\.$1/ if (@_ > 1 && $_[1] > 0); Won't "and $_[1] > 0" be executed even if @_ == 1? OTOH, won't $_[1] > 0 be false when @_ > 1 is false, i.e. could not the condition be safely collapsed to: if $_[1] > 0; ? Also, in the replacement part of the substitute statement, although it's allowed, it should not be required that you escape the decimal point. The . character as a regex meta character only has regex metacharacter meaning in the match part of a regex. [EMAIL PROTECTED] All opinions are my own and not necessarily those of my employer ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Zero-suppression Regex
Dirk Bremer wrote: > Here is a quicker version of the zero-suppression routine that will also float a >leading sign character: > > sub ZeroSuppress2($) > { > my $self = shift; > > # Return if the argument is less than two digits. > return($self) if (length($self) < 2); > > # Search for a embedded decimal point. > my $decimal = rindex($self,'.'); > > # If the decimal point exists, substract one from its position > # to define the end point of the zero-suppression, else determine > # the entire length of the string and subtract one to define the > # end point of the zero-suppression. > if ($decimal > 0) {$decimal-- ;} > else {$decimal = length($self) - 1;} > > # Iterate through the list suppressing leading zeroes. > my $i; > my $c; > for ($i = 0; $i < $decimal; $i++) > { > $c = substr($self,$i,1); > # Skip to the next character if the current character is a > # plus-sign, minus-sign, or space. > next if ($c eq ' ' or > $c eq '+' or > $c eq '-'); > > # Terminate the loop if the current character is s digit. > last if ($c > 0); > > # Replace the zero in the current character with a space. > substr($self,$i,1) = ' '; > } > > $c = substr($self,0,1); > # Check for a leading sign-character. > if ($c eq '+' or $c eq '-') > { > # If the list has more two elements, move the sign-character > # from the leftmost position to the position of the rightmost > # space character. > if ($i > 1) {substr($self,$i - 1,1) = $c; substr($self,0,1) = ' ';} > } > > return($self); > } > > After running some more benchmarks, this routine is several orders of magnitude >faster than sprinf or a regex. I modified your routine a bit to handle replacing 0's with blanks or just removing the leading 0's. My benchmark show the RE to be faster unless I made a mistake somewhere. sub ZeroSuppress2 ($;$) { my $self = shift; my $replace = shift || 0; # Return if the argument is less than two digits. return $self if length ($self) < 2; # Search for a embedded decimal point. # If the decimal point exists, substract one from its position # to define the end point of the zero-suppression, else determine # the entire length of the string and subtract one to define the # end point of the zero-suppression. my $decimal = rindex ($self, '.') - 1; $decimal = length ($self) - 1 if $decimal < 0; # Iterate through the list suppressing leading zeroes. my ($c, $i); for ($i = 0; $i < $decimal; $i++) { $c = substr ($self, $i, 1); # Skip to the next character if the current character is a # plus-sign, minus-sign, or space. next if $c eq ' ' or $c eq '+' or $c eq '-'; # Terminate the loop if the current character is s digit. last if ($c gt '0'); # Replace the zero in the current character with a space. if ($replace) { substr ($self, $i, 1) = ' '; } else { $self = substr ($self, 0, $i) . substr ($self, $i+1); $i--; $decimal--; } } $c = substr ($self, 0, 1); # Check for a leading sign-character. if ($replace and $c eq '+' or $c eq '-') { # If the list has more two elements, move the sign-character # from the leftmost position to the position of the rightmost # space character. if ($i > 1) { substr ($self, $i - 1, 1) = $c; substr ($self, 0, 1) = ' '; } } return $self; } sub ZeroSuppress3 ($;$) { local $_ = shift; my $replace = shift || 0; if ($replace) { 1 while s/(? "ZeroSuppress2 ($num, 0)", 'ZS3' => "ZeroSuppress3 ($num, 0)", 'ZS2r' => "ZeroSuppress2 ($num, 1)", # replace 0's with blanks 'ZS3r' => "ZeroSuppress3 ($num, 1)", }); my $num = '01.001'; Benchmark: timing 40 iterations of ZS2, ZS2r, ZS3, ZS3r... ZS2: 7 wallclock secs ( 6.59 usr + 0.00 sys = 6.59 CPU) @ 60698.03/s (n=40) ZS2r: 6 wallclock secs ( 6.81 usr + 0.00 sys = 6.81 CPU) @ 58737.15/s (n=40) ZS3: 4 wallclock secs ( 3.84 usr + 0.00 sys = 3.84 CPU) @ 104166.67/s (n=40) ZS3r: 5 wallclock secs ( 5.66 usr + 0.00 sys = 5.66 CPU) @ 70671.38/s (n=40) my $num = '+01.001'; Benchmark: timing 40 iterations of ZS2, ZS2r, ZS3, ZS3r... ZS2: 9 wallclock secs ( 8.62 usr + 0.00 sys = 8.62 CPU) @ 46403.71/s (n=40) ZS2r: 13 wallclock secs (12.14 usr + 0.00 sys = 12.14 CPU) @ 32948.93/s (n=40) ZS3: 2 wallclock secs ( 2.64 usr + 0.00 sys = 2.64 CPU) @ 151515.15/s (n=40) ZS3r: 3 wallclock secs ( 3.35 usr + 0.00 sys = 3.35 CPU) @ 119402.99
Re: Zero-suppression Regex
Here is a quicker version of the zero-suppression routine that will also float a leading sign character: sub ZeroSuppress2($) { my $self = shift; # Return if the argument is less than two digits. return($self) if (length($self) < 2); # Search for a embedded decimal point. my $decimal = rindex($self,'.'); # If the decimal point exists, substract one from its position # to define the end point of the zero-suppression, else determine # the entire length of the string and subtract one to define the # end point of the zero-suppression. if ($decimal > 0) {$decimal-- ;} else {$decimal = length($self) - 1;} # Iterate through the list suppressing leading zeroes. my $i; my $c; for ($i = 0; $i < $decimal; $i++) { $c = substr($self,$i,1); # Skip to the next character if the current character is a # plus-sign, minus-sign, or space. next if ($c eq ' ' or $c eq '+' or $c eq '-'); # Terminate the loop if the current character is s digit. last if ($c > 0); # Replace the zero in the current character with a space. substr($self,$i,1) = ' '; } $c = substr($self,0,1); # Check for a leading sign-character. if ($c eq '+' or $c eq '-') { # If the list has more two elements, move the sign-character # from the leftmost position to the position of the rightmost # space character. if ($i > 1) {substr($self,$i - 1,1) = $c; substr($self,0,1) = ' ';} } return($self); } After running some more benchmarks, this routine is several orders of magnitude faster than sprinf or a regex. Dirk Bremer - Systems Programmer II - ESS/AMS - NISC St. Peters 636-922-9158 ext. 652 fax 636-447-4471 [EMAIL PROTECTED] www.nisc.cc ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Zero-suppression Regex
Alistair, No need to fall off of the horse, you will get bruised that way. Sometimes you have to point your horse in a different direction to view the horizon. Note the following: { use Benchmark; timethese(10, {'sprintf' => sub {my $num = '0.00';$num = sprintf('%1.2f',$num);}, 'regex' => sub {my $num = '0.00';$num =~ s/\b0+(?=\d)//;}}); } returns: Benchmark: timing 10 iterations of regex, sprintf... regex: 2 wallclock sec ( 2.30 usr + 0.00 sys = 2.30 CPU) @ 43402.78/s (n=10) sprint: 3 wallclock secs ( 2.39 usr + 0.00 sys = 2.39 CPU) @ 41788.55/s (n=10) I was quite surprised to see that the regex won out by a bit, I would have thought that it would have invoked more overhead. This has been a learning experience, which is one of the aspects of this list that I enjoy. Dirk Bremer - Systems Programmer II - ESS/AMS - NISC St. Peters 636-922-9158 ext. 652 fax 636-447-4471 [EMAIL PROTECTED] www.nisc.cc ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Zero-suppression Regex
Dirk, If you really want a general solution to get rid of initial zeros without affecting the remainder of the string, then I stand corrected; a regex would probably be best. Consider my high horse bruised, following an incident with a low bridge :-) One last try though before my horse is retired. If all of your input have two decimal places then perhaps you should use sprintf. sprintf('%0.2f',$_) It might not be faster than a "^" anchored regex, but it is more elegant and can cope with a wide range of input formats. Horses for courses I suppose. :-) Cheers, Alistair > -- > Alistair McGlinchy, [EMAIL PROTECTED] > Sizing and Performance, Central IT, ext. 5012, ph +44 20 7268-5012 > Marks and Spencer, 3 Longwalk Rd, Stockley Park, Uxbridge UB11 1AW, UK > > -Original Message- > From: Dirk Bremer [SMTP:[EMAIL PROTECTED]] > Sent: Tuesday, February 12, 2002 6:52 PM > To: 'Perl Win32 Users Mailing List' > Subject: Re: Zero-suppression Regex > > In my instance, the 0+ solution would not produce the results I desire. > For example, imagine the input string is "000.00" as in > a dollar amount. The result I desire is "0.00", while the 0+ method > results in "0". The regex that Joe provided produced my desired > result, while the 0+ maybe useful for other things. > > Dirk Bremer - Systems Programmer II - ESS/AMS - NISC St. Peters > 636-922-9158 ext. 652 fax 636-447-4471 > > [EMAIL PROTECTED] > www.nisc.cc > > ----- Original Message - > From: <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Tuesday, February 12, 2002 11:36 > Subject: RE: Zero-suppression Regex > > > > As Michael G Schwern over on the Fun With Perl group said: > > > Folks, I'm clawing my eyes out here. Stop hitting the regex crack > > pipe! > > > > Although he was specifically talking about printf vs a regex for zero > > padding numbers, the principle still applies. Perl has an incredibly > > efficient "convert number-like text into a proper number" function. It > works > > implicitly every time you use a numerical operator on a scalar. The > regex > > tool is comparatively slow and an all-purpose, robust solution is > > non-trivial. > > > > Why not just use 0+$_ and let perl work its magic. I challenge any of > you > > regex "pushers" out there to write a regex that beats this in either > speed > > or elegance. > > > > BTW Joe's solution s/^-?0+(?=\d)// will fail for negative numbers. I > would > > post a fix but I appear to be on my high horse at the moment :-) > > > ___ > Perl-Win32-Users mailing list > [EMAIL PROTECTED] > To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs --- Registered Office: Marks & Spencer p.l.c Michael House, Baker Street, London, W1U 8EP Registered No. 214436 in England and Wales. Telephone (020) 7935 4422 Facsimile (020) 7487 2670 www.marksandspencer.com Please note that electronic mail may be monitored. This e-mail is confidential. If you received it by mistake, please let us know and then delete it from your system; you should not copy, disclose, or distribute its contents to anyone nor act in reliance on this e-mail, as this is prohibited and may be unlawful. The registered office of Marks and Spencer Financial Services Limited, Marks and Spencer Unit Trust Management Limited, Marks and Spencer Life Assurance Limited and Marks and Spencer Savings and Investments Limited is Kings Meadow, Chester, CH99 9FB. ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Zero-suppression Regex
In my instance, the 0+ solution would not produce the results I desire. For example, imagine the input string is "000.00" as in a dollar amount. The result I desire is "0.00", while the 0+ method results in "0". The regex that Joe provided produced my desired result, while the 0+ maybe useful for other things. Dirk Bremer - Systems Programmer II - ESS/AMS - NISC St. Peters 636-922-9158 ext. 652 fax 636-447-4471 [EMAIL PROTECTED] www.nisc.cc - Original Message - From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, February 12, 2002 11:36 Subject: RE: Zero-suppression Regex > As Michael G Schwern over on the Fun With Perl group said: > > Folks, I'm clawing my eyes out here. Stop hitting the regex crack > pipe! > > Although he was specifically talking about printf vs a regex for zero > padding numbers, the principle still applies. Perl has an incredibly > efficient "convert number-like text into a proper number" function. It works > implicitly every time you use a numerical operator on a scalar. The regex > tool is comparatively slow and an all-purpose, robust solution is > non-trivial. > > Why not just use 0+$_ and let perl work its magic. I challenge any of you > regex "pushers" out there to write a regex that beats this in either speed > or elegance. > > BTW Joe's solution s/^-?0+(?=\d)// will fail for negative numbers. I would > post a fix but I appear to be on my high horse at the moment :-) ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Zero-suppression Regex
[EMAIL PROTECTED] wrote, on Tuesday, February 12, 2002 12:36 PM : Why not just use 0+$_ and let perl work its magic. I challenge any of you : regex "pushers" out there to write a regex that beats this in either speed : or elegance. Oh, of course. Duh. I had blinders on from the OP's subject line. : BTW Joe's solution s/^-?0+(?=\d)// will fail for negative numbers. I would : post a fix but I appear to be on my high horse at the moment :-) "...but it's too large for this margin", huh? Yeah, my bad. I should have stuck with the one with \b instead of ^-?. :) Joe == Joseph P. Discenza, Sr. Programmer/Analyst mailto:[EMAIL PROTECTED] Carleton Inc. http://www.carletoninc.com 219.243.6040 ext. 300fax: 219.243.6060 Providing Financial Solutions and Compliance for over 30 Years ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Zero-suppression Regex
Very nice. Thanks. --- Jim --- $Bill Luebkert <[EMAIL PROTECTED]> wrote: > use strict; > > my @nums = qw(00123 04 004.01 000 00 0 .0 0.01 > 0012.001 000.0001); > $_ = join ' ', @nums; # save orig for bottom part > > foreach (@nums) { > print "$_ => "; > s/(? print "$_\n"; > } > # doing them all at once also works: > print "$_\n"; > s/(? print "$_\n"; __ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Zero-suppression Regex
Jim Angstadt wrote: > Dear Joe and Dirk, > > Thanks for getting me to look at assertions. > > Expanding the requirement a little, > here is what I have so far: > > my @nums = qw/ 00123 04 004.01 000 00 0 .0 >0.01 0012.001 000.0001 /; > foreach ( @nums ) { >s/(\b)0+(?=\d)(\.*.*)/$1$2/g; # fails on 0.01 >print $_, "\n"; > } > > Any ideas? Try this one: use strict; my @nums = qw(00123 04 004.01 000 00 0 .0 0.01 0012.001 000.0001); $_ = join ' ', @nums; # save orig for bottom part foreach (@nums) { print "$_ => "; s/(?Mailto:[EMAIL PROTECTED] / ) /--< o // // http://dbecoll.tripod.com/ (Free site for Perl) -/-' /___/_<_http://www.todbe.com/ ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Zero-suppression Regex
Dear Joe and Dirk, Thanks for getting me to look at assertions. Expanding the requirement a little, here is what I have so far: my @nums = qw/ 00123 04 004.01 000 00 0 .0 0.01 0012.001 000.0001 /; foreach ( @nums ) { s/(\b)0+(?=\d)(\.*.*)/$1$2/g; # fails on 0.01 print $_, "\n"; } Any ideas? --- Jim --- Dirk Bremer <[EMAIL PROTECTED]> wrote: > s/0+(?=\d)(\d)/$1/g ? __ Do You Yahoo!? Send FREE Valentine eCards with Yahoo! Greetings! http://greetings.yahoo.com ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs