Re: code to parse out HTML tags

Janek Schleicher Fri, 27 Sep 2002 10:35:46 -0700

Shawn Milochik wrote at Fri, 27 Sep 2002 16:44:52 +0200:

> I have some code that I've been working on.  I know it will be painfully
> obvious to you that I'm not very experienced yet with Perl, because I know
> I'm not taking advantage of the full text-processing power it has.  What I
> want to do is to read an HTML file and find  all occurrences of an unclosed
> tag, like a <td> without an </td>.  So far, the code lets me know how many
> tags aren't closed, but not which ones, which I think will be more
> complicated, because I'll have to use some hashes to keep track every time
> I enter a new <table> tag.


You should really use a HTML::* module.

> Another thing I want to do soon is to be able to read an HTML file and
> parse out contents of a drop-down box into a .CSV.
> 
> Example:
> 
> <select name=drpSample>
>    <option value=1>first thing</option>
>    <option value='fred'>Fred</option>
>    <option value="Jack & Jill">rhyme</option>
> </select>
> 
> I'd like a Perl script to return a .CSV containing:
> 
> *************************
> 1, first thing
> fred, Fred
> Jack & Jill, rhyme
> *************************


That's not valid CSV.
useally, it would be
1,"first thing"
fred,Fred
"Jack & Jill",rhyme

Here's a snippet doing this job,
but I expect that every option is in one line:

use Tie::CSV_File;  # or another CSV module, search for them in CPAN
tie my @data, 'Tie::CSV_File', $fname;

while (<HTML>) {
   if (/<option value=['"]?(.*?)['"]?>(.*?)</) {
      push @data,[$1,$2];
   }
}

Best Wishes,
Janek


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: code to parse out HTML tags

Reply via email to