Re: String Parse

Frazier, Joe Jr Thu, 26 Dec 2002 04:50:20 -0800

> And the third, the worse one, it's parsing input received, 
> and it's like
> this:
> 
>       while($content =~ m|<td valign=\"top\" 
> class=\"list-item-odd\"><a href=\"(.*?)\" 
> class=\"list-item-title\"><b>(.*?)</b></a><br>([\w\W]*?)<td 
> valign=\"top\" class=\"list-item-odd\">(.*?)<br>([\w\W]*?)<td 
> valign=\"top\" 
> class=\"list-item-odd\">(.*?)</td>([\w\W]*?)<td 
> valign=\"top\" class=\"list-item-odd\">(.*?)</td>|igmo) {
> 
> 
> But the value, every EVEN time, which it's therefore missing 
> (I'm assuming)
> looks like this:
> 
>       while($content =~ m|<td valign=\"top\" 
> class=\"list-item\"><a href=\"(.*?)\" 
> class=\"list-item-title\"><b>(.*?)</b></a><br>([\w\W]*?)<td 
> valign=\"top\" class=\"list-item\">(.*?)<br>([\w\W]*?)<td 
> valign=\"top\" class=\"list-item\">(.*?)</td>([\w\W]*?)<td 
> valign=\"top\" class=\"list-item\">(.*?)</td>|igmo) {
> 
> 
> What could the line look like if it was only looking for 
> list-item in the line
> with everything else, no matter if its a line with 
> list-item-odd or not?
> 
> Thank you so very much!  I've been working on these 
> modifications and working
> on them, but I need sleep badly, so I'm giving up and flat 
> out asking for
> help in solving it.  Thank you, I appreciate your time.
> 
> Merry Christmas,
> 
> Steve



Steve, since I am not quite sure what you want to do, you may wish to check out 
XML::LibXML.  This module allows parsing of xml|html fragments into an XML Dom 
structure.  You can then perform xpath node lookups based on your criteria and then 
drill down to get the data you need.  IF you want the value of every list item, no 
matter what its parent contains, then using xpath would be a perfect fit, at the cost 
of adding more code and possibly learning a new skillset (all things considered, 
learning to use XML::LibXML is a VERY good thing).  Besides that, if someone changes 
the html in any way (adding a new attribute to the tag for example), then you would 
have to go back and debug your script.  Plus, you would avoid a good deal of possible 
backtraking and backreferences. This may be negated by the size of the html string 
passed since using a DOM increases the in memory size by several times.  

I use XML::LibXML to parse jobpostboards currently to drill down into several layers 
or pages (using LWP) and verify my clients jobs are all posted by compairing data on 
each page to a DB generated list of jobs that should be on the board.  If thing you 
might go that way, get the ppm from the http://theoryx5.uwinnipeg.ca/ppmpackages/ 
repository(there are two repositories now, one for 5.6 and one for 5.8. If Randy or 
anyone else is listening, can you give the correct URL for each?)

Joe
_______________________________________________
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: String Parse

Reply via email to