I have some code that I've been working on.  I know it will be painfully
obvious to you that I'm not very experienced yet with Perl, because I know
I'm not taking advantage of the full text-processing power it has.  What I
want to do is to read an HTML file and find  all occurrences of an unclosed
tag, like a <td> without an </td>.  So far, the code lets me know how many
tags aren't closed, but not which ones, which I think will be more
complicated, because I'll have to use some hashes to keep track every time
I enter a new <table> tag.

I'm not asking anyone to finish this functionality.  If someone can just
trim the fat to show me how to do better what I've completed so far, I'd
appreciate it.

Another thing I want to do soon is to be able to read an HTML file and
parse out contents of a drop-down box into a .CSV.

Example:

<select name=drpSample>
   <option value=1>first thing</option>
   <option value='fred'>Fred</option>
   <option value="Jack & Jill">rhyme</option>
</select>

I'd like a Perl script to return a .CSV containing:

*************************
1, first thing
fred, Fred
Jack & Jill, rhyme
*************************
Can anyone point me towards the proper functions in Perl to read up on so
that I can:
1.  Get the value from the option, regardless of whether it is surrounded
by single quotes, double quotes, or nothing.
2.  Get the listing from between the <option> and </option> tags.

Thank you,
Shawn


Follows:  My code for the first question:


%%%%%%%%%%%%%%%%%%
Beginning of code
%%%%%%%%%%%%%%%%%%

#!/usr/bin/perl

open(FILE, @ARGV[0]);

@data = <FILE>;

$position = 0;  #Current position within the string I'm reading.
$tagCount = 0;  #number of tags found so far
$brhrInput = 0; #Number of tags with no closers (<br>, <hr>, <input)
$formattingTags = 0;    #Formatting tags which require closers, <TD>, <TR>,
<FONT>, etc...
$openFormatting = 0;    #Number of $formattingTags which are openers <tag>
$closeFormatting = 0;   #Number of $formattingTags which are closers </tag>

foreach (@data){

    $position = 0;
    while ($position < length()){

        s/(br|hr|input)/\U$1/gi;

        if (substr($_, $position, 1) eq "<"){
            # Increment by one
            $tagCount ++;


            #Checking for non-formatting tags (those which require not
closing tags)
            if ((substr($_, $position, 4) eq "<BR>")||(substr($_,
$position, 4) eq "<HR>")||(substr($_, $position, 6) eq "<INPUT")){
                $brhrInput ++;
            }else{
                $formattingTags ++;

                if (substr($_, $position, 2) eq "</"){
                    $closeFormatting ++;
                }else{
                    $openFormatting ++;

                }

            }
        }
        $position ++;

    }


}

#Print the top border
print chr(201);
print chr(205) x (length(@ARGV[0]) + 17);
print chr(187);

#Print the left border, the text, and the right border.
print "\n";
print chr(186);
print " Results for '@ARGV[0]': ";
print chr(186) . "\n";

#Print the bottom border
print chr(200);
print chr(205) x (length(@ARGV[0]) + 17);
print chr(188) . "\n\n";


print chr(4) . "This file has $tagCount occurances of the character \"<\".
\n";
print chr(4) . "This file has $formattingTags formatting tags.\n";
print chr(4) . "This file has $openFormatting formatting tags opened.\n";
print chr(4) . "This file has $closeFormatting formatting tags closed.\n";
print chr(4) . "This file has " . ($openFormatting - $closeFormatting) . "
missing closing tags.\n";
print chr(4) . "This file has $brhrInput occurances of the \"<BR>\", \"<HR>
\" or \"<INPUT\" tags.\n";


%%%%%%%%%%%%%%%%%%
End of code
%%%%%%%%%%%%%%%%%%





**********************************************************************
This e-mail and any files transmitted with it may contain 
confidential information and is intended solely for use by 
the individual to whom it is addressed.  If you received
this e-mail in error, please notify the sender, do not 
disclose its contents to others and delete it from your 
system.

**********************************************************************


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to