Re: help with a regex and greediness

R. Joseph Newton Sat, 13 Mar 2004 14:12:26 -0800

Stuart White wrote:



> $line =
> "merlyn::118:10:Randal:/home/merlyn?/usr/bin/perl";
> @fields = s;oit(/:/,$line);
> #now @fields is ("merlyn:,
> "","118","10","Randal","/home/merlyn","/usr/bin/perl")

I don't think so:

Greetings! E:\d_drive\perlStuff>perl -w
$line =
  "merlyn::118:10:Randal:/home/merlyn?/usr/bin/perl";
  @fields = split(/:/,$line);
  print "$_\n" foreach @fields;

^Z
merlyn

118
10
Randal
/home/merlyn?/usr/bin/perl

The only place a colon split splits is where you have a colon.

> I was having trouble using this example to figure out
> why my line wasn't splitting the way I wanted.  So I
> included this line:
> $line = 'Spurs 94, Suns 82, Heat 99, Magic 74'
>
> > > then @result would look like this, right?
> > > @result[0] = 'Spurs 94'
> >
> > You should know better than this by now, with the
> > help you've been getting.
> > With that @ symbol, you are referring to a slice--an
> > array of one element.  ***
> > When you are referring to a scalar, use the scalar
> > symbol $ ***
> >
>
> Yes, that was an error on my part.  You're right, I
> know that it should have been $result[0]

OK.  It can be tricky, and it took me a few go-rounds to get that down.

> > > @result[1] = 'Suns 82'
> >
> > These first two make sense, pretty much.  I think
> > this is one place where $team1
> > and $team2 might be more sensible, though it is even
> > better, if there is some
> > order to which team is listed first in the pairing,
> > to have you identifier
> > reflect that order, say $home_team and $visitor [if
> > these are accurate of
> > course]
>
> I'll have to study the data to make sure that the home
> team and visiting teams are consistently in the same
> place.  That's a good idea too.

...or you could even use $team_left and $team_right, since those terms would
accurately describe the relative positions of the two substrings within each
line.

>
>
> >
> > >
> > > @result[2] = 'Heat 99'
> >
> > Going on to load more elements into the array does
> > not make sense..  Does your
> > data come in one continuous line, just a long string
> > of team names separated by
> > commas?
>
> Addressed above.

OK.  Sorry I missed the runup.

>
>
> My impression is that it came line by line.
> >  There would be no sense in
> > doing the work of the split only to throw everything
> > back in the same pile.
>
> > There are a lot of different things you could do
> > here, but the sensible ones
> > would indicate that you should do something with the
> > stats for each pairing
> > before you go on to the next line.
> >
> >
> > >
> > > @result[3] = 'Magic 74'
> > >
> > > If I wanted to split on the numbers as well, why
> > > doesn't this work:
> > > @result = split (/\s*\d*,\s*\d*/, $line);
> >
> > The previous post already explained this, and you
> > have seen the result of what
> > you are trying.  You can't do that because the
> > information disappears if you use
> > it in the split expression.
> >
>
> I see.  It seems that I didn't pick up on this
> entirely, though I do remember reading it.
>
> > Splitting the lines into a pair of team-score
> > combinations is one step.  It
> > deserves a line of its own.
> > Extracting the name and score from each team-score
> > clause is another step that
> > deserves a line or three of its own.
> >
>
> Ok, I didn't know this.  I thought I could, and should
> do it all in one or two lines.  I get confused about
> what data $_ has sometimes.

This is a very good indicator that you should be using the default $_ less and
named variables more.  It all may look the same to the compiler, but since human
error is the most likely cause of problems, it is more important to be as
understandable as possible to the human reader.

> After I run the initial
> regex, I am usually extracting information from the
> backreferences.  When those backreferences or $_
> contain more info than I want, my solution is to
> tighten the original regex.  You are suggesting that
> instead of that, I ought to just run a second regex on
> it, or a split on it in order to take the stress off
> of Perl and keep the program efficient, right?  Is
> that what you are suggesting?

Whenever possible, yes.  Try to think through the process the regex engiine will
take.  The classic example is the line trim.  You can do this

$string =~ s /^\s*(.*?)\s*$/;
and it will work, but the engine has to pack all the middle protion around.  It
works much better as:
$string =~ s/^\s*/;
$string =~ s/\s*$/;

likewise a couple well-placed simple splits should handle you lines quite
efficiently--and clearly

my $scoreboard_left, $scoreboard_right = split /,\s*/, $scores_line;
for ($scoreboard_left, $scoreboard_right) {
    my ($team_name, $score) = split /\s+/, $_;
    ...whatever you are doing with the score
}

I must confess that I am a bit at a loss as to what to do with the information
at this point.  I have forgotten what your goal is in terms of the form of
output information you want.  Presumably, you should have some structure ready
to receive the information extracted here, so you can assign the value extracted
to those structures.

> >
> > Though its Perl implementation is highly efficient,
> > the regex process is very
> > costly, and the cost rises much more through
> > complexity of expression than
> > through multiple runs.
>
> Ok, see, I thought that the program would run much
> more slowly if I kept running through the data.  I
> didn't think to use regular expressions in steps.

Well, the regex here is only the vehicle to carry the split, so you don't want
to put too much focus on that.  More like split in steps, each using a simple
regex.

>
> > Please review
> > perldoc -f split
> > for a better understanding.  The split regex, is
> > *what gets thrown away*.  Do
> > not put any data you may need in it.
> >
>
> The formatting of perldoc from the command line makes
> it terribly difficult to read for me.

prompt> perldoc -f split > "perldoc/split.txt"

...doesn't seem to work on Acme::EyeDrops, but I can't think of anything else
this has failed on.  This presumes that you have achild directory named perldoc,
of course.  If you are in a sibling folder to your peldoc texts folder, you may
wish to use something more like:
prompt> perldoc -f split > "../perldoc/split.txt"
...with my deepest apologies to anyone offended by Windows-style extensions.

>  There are huge
> tabs between words.  Is perldoc available in another
> format, say the web?

Are you on Windows, with ActiveState?  If so:
Start|Programs|ActiveState ActivePerl 5.8|Documentation
will get you going.  I use it along with the texts I produce using the method
shown above.  Different styles seem more comfortable depending on context, type
of inquiry, and the phase of the moon [ooow-oooooooh!]


> > Keep it grounded--by choosing identifiers carefully to always communicate
> > clearly what information they hold
> > Keep it simple--most things are, if you let them be.
> > Do one thing per line until you are using all of the basic constructs
> fluently.
> > Pay close attention to the nature of each thing you are using a variable to
> > describe, and make the containment class symbol [$, @, or %] that you use,
> > reflects accurately whether you are referring to a container, or to an
> element
> > held in the container.
> >
>
> I'm trying.  Thanks for the advice.
>

I cxan see that, and I respect the way you are sticking with it.  Keep it
up--you'll get there.

Joseph


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: help with a regex and greediness

Reply via email to