Re: Removing the blank spaces

Williamawalters Mon, 13 Mar 2006 13:17:31 -0800

In a message dated 3/13/2006 9:43:44 A.M. Eastern Standard Time, [EMAIL PROTECTED] writes:

> [EMAIL PROTECTED] wrote on 03/11/2006 03:05:10
> PM:
> > Today's Topics:
> >    4. Removing the blank spaces (Naresh Bajaj)
> > ----------------------------------------------------------------------
> > ------------------------------
> > Message: 4
> > Date: Sat, 11 Mar 2006 13:49:20 -0600
> > From: "Naresh Bajaj" <[EMAIL PROTECTED]>
> > Subject: Removing the blank spaces
> > To: [email protected]
> > Message-ID:
> >    <[EMAIL PROTECTED]>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > Hello,
> > This is my problem. I have extracted one variable value from a file and
> > saved it another fie.
> > Problem is that it has too many spaces as shown in this example. I want
> to
> > remove those blank spaces.
> > If I use split, / / $fti, I am getting partial results as shown below.
> > Please let me know how can I remove those spaces. I appreciated.
> <examples removed>
> split creates an array on a boundary. i think that while it could be used
> for what you want, it would be so in a round-about way. there are more
> direct methods. I suspect your dislike of the result is a product of
> not-enough-understanding (similar to the too little information is worse
> than none that creates panic among plebs) of split. instead of trying to
> deduce your code i'll give you a regexp that should give the desired
> results
> [code]
> #! /usr/bin/perl
> $input=" example    information in     a string   ";
> $input =~ /\s+/ /g;

should be s/\s+/ /g; note initial s in s/// substitution.

> print "$input\n";
> $input =~ /^\s*(\S.*)\s*$/$1/;

should be s/^\s*(\S.*)\s*$/$1/;

> print "$input\n";
> [/code]
> should print:
> example information in a string
> example information in a string
> note that the first one has an extra " " at the end.

actually, both will have an extra space if there was any trailing whitespace at all.

> it could also have
> more \n than intended.

the first substitution s/\s+/ /g; will remove any and all \n.

> chomp removes that.

but there is no chomp().

> i'm not sure, and don't believe
> it would remove leading white space. to remove that, i used the second
> substitute instead.
>
> >
> > I hope I clearly explained the problem. Please let me know if you are
> not
> > clear about my issue. Thanks,
>
> you did, and the potential of the confusion over what split does is why
> i'm now going to add a little explanation of what the regular expressions
> are doing. in the hopes that i'll help teach you them. =o)
>
> /\s+/ /g

again, should be s/\s+/ /g

>
> uses perl short \s which is [ \n\r\t] and one other thing also
> "whitespace." + means "1 or more" so it'll find the first run of white
> space and replace it with the next part, a single " ". the g makes this
> global, so it is don through out the whole set of data, hitting all the
> occurrences.
>
> /^\s*(\S.*)\s*$/$1/

again, should be s/^\s*(\S.*)\s*$/$1/

>
> again uses the \s short and also uses the \S short. the \S means [^\s] and
> the . is anything, the * means 0 or more. the $ at the end is an
> end-of-line anchor and the ^ at the beginning is a beginning-of-line
> anchor
> this finds all the white space until the first non-whitespace, then all
> the white space at the end. it then replaces the entire line with $!

typo: $! should be $1

> which
> is the capture from (\S.*) which is everything that is not the beginning
> and ending white space.

the problem here is that in the capturing parenthetic _expression_ (\S.*) the .*

has a ``greedy'' * (zero or more) quantifier. this will consume everything to the

end of the string (or to the first \n, but there are no newlines any more due to the

action of the first substitution) and include those characters in the $1 capture variable.

if there was a space at the end of the string, it will be a part of $1.

the \s* just after the capturing parenthesis also has a greedy quantifier, but

in a situation like this, the first greedy quantifier to the table wins the day: \s* can

be satisfied with zero whitespace (although it would like more), so the regex as a

whole is satisfied.

to fix this problem, make the * quantifier in (\S.*) ``lazy'' with a ? modifier:

i.e., (\S.*?). this allows it to ``back off'' and let the \s* gobble as much

whitespace as it wants. see code examples below.

>
> > --
> > Naresh Bajaj, Intern,
> > Cardiac Rhythm Disease Management,
> > Medtronic Inc.,
> > 763-514-3799
>
>
> HTH
> Josh Perlmutter

greedy * in (\S.*) can leave a space at end of the string:

[code]

$input = qq( example \n information in \n\n a string );
$input =~ s/\s+/ /g;
print qq({$input}\n);
$input =~ s/^\s*(\S.*)\s*$/$1/;
print qq({$input}\n);

[output]

{ example information in a string }
{example information in a string }

lazy * in (\S.*?) leaves no space:

[code]

$input = qq( example \n information in \n\n a string );
$input =~ s/\s+/ /g;
print qq({$input}\n);
$input =~ s/^\s*(\S.*?)\s*$/$1/;
print qq({$input}\n);

[output]

{ example information in a string }
{example information in a string}

(also, the \S in the parenthetic expressions is redundant.)

hth -- bill walters

_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: Removing the blank spaces

Reply via email to