I have some code that I've been working on. I know it will be painfully obvious to you that I'm not very experienced yet with Perl, because I know I'm not taking advantage of the full text-processing power it has. What I want to do is to read an HTML file and find all occurrences of an unclosed tag, like a <td> without an </td>. So far, the code lets me know how many tags aren't closed, but not which ones, which I think will be more complicated, because I'll have to use some hashes to keep track every time I enter a new <table> tag.
I'm not asking anyone to finish this functionality. If someone can just trim the fat to show me how to do better what I've completed so far, I'd appreciate it. Another thing I want to do soon is to be able to read an HTML file and parse out contents of a drop-down box into a .CSV. Example: <select name=drpSample> <option value=1>first thing</option> <option value='fred'>Fred</option> <option value="Jack & Jill">rhyme</option> </select> I'd like a Perl script to return a .CSV containing: ************************* 1, first thing fred, Fred Jack & Jill, rhyme ************************* Can anyone point me towards the proper functions in Perl to read up on so that I can: 1. Get the value from the option, regardless of whether it is surrounded by single quotes, double quotes, or nothing. 2. Get the listing from between the <option> and </option> tags. Thank you, Shawn Follows: My code for the first question: %%%%%%%%%%%%%%%%%% Beginning of code %%%%%%%%%%%%%%%%%% #!/usr/bin/perl open(FILE, @ARGV[0]); @data = <FILE>; $position = 0; #Current position within the string I'm reading. $tagCount = 0; #number of tags found so far $brhrInput = 0; #Number of tags with no closers (<br>, <hr>, <input) $formattingTags = 0; #Formatting tags which require closers, <TD>, <TR>, <FONT>, etc... $openFormatting = 0; #Number of $formattingTags which are openers <tag> $closeFormatting = 0; #Number of $formattingTags which are closers </tag> foreach (@data){ $position = 0; while ($position < length()){ s/(br|hr|input)/\U$1/gi; if (substr($_, $position, 1) eq "<"){ # Increment by one $tagCount ++; #Checking for non-formatting tags (those which require not closing tags) if ((substr($_, $position, 4) eq "<BR>")||(substr($_, $position, 4) eq "<HR>")||(substr($_, $position, 6) eq "<INPUT")){ $brhrInput ++; }else{ $formattingTags ++; if (substr($_, $position, 2) eq "</"){ $closeFormatting ++; }else{ $openFormatting ++; } } } $position ++; } } #Print the top border print chr(201); print chr(205) x (length(@ARGV[0]) + 17); print chr(187); #Print the left border, the text, and the right border. print "\n"; print chr(186); print " Results for '@ARGV[0]': "; print chr(186) . "\n"; #Print the bottom border print chr(200); print chr(205) x (length(@ARGV[0]) + 17); print chr(188) . "\n\n"; print chr(4) . "This file has $tagCount occurances of the character \"<\". \n"; print chr(4) . "This file has $formattingTags formatting tags.\n"; print chr(4) . "This file has $openFormatting formatting tags opened.\n"; print chr(4) . "This file has $closeFormatting formatting tags closed.\n"; print chr(4) . "This file has " . ($openFormatting - $closeFormatting) . " missing closing tags.\n"; print chr(4) . "This file has $brhrInput occurances of the \"<BR>\", \"<HR> \" or \"<INPUT\" tags.\n"; %%%%%%%%%%%%%%%%%% End of code %%%%%%%%%%%%%%%%%% ********************************************************************** This e-mail and any files transmitted with it may contain confidential information and is intended solely for use by the individual to whom it is addressed. If you received this e-mail in error, please notify the sender, do not disclose its contents to others and delete it from your system. ********************************************************************** -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]