I've got a lil problem with HTML tags. Here's the description. 

My site accepts HTML files by upload. A lot of these files are written in MS
Word and then saved as HTML files from that. MS Word likes to put a bunch of
garbage at the beginning of the file. Now, when users upload their HTML
files, my script goes and striptags all of the unnecessary junk in there
except it can't rid all this junk (HTML, XML, CSS, JavaScript) at the
beginning of the HTML file. Some of these tags span multiple lines, and my
script goes through line-by-line, so it won't identify these as tags. Is
there a simpler fashion? I don't need the junk about style sheeting and
stuff, because I have a style sheet that will take care of styling the files
the way they should be. I don't want the extra tags, even though they're
invisible to users when they web-view, because these are e-mailable files
(for HTML mail, it's fine; for text mail, I need to strip it down and that's
the problem).

=================================================
Just in case, I've included the HTML code below:


<html xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns="http://www.w3.org/TR/REC-html40";>

<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 10">
<meta name=Originator content="Microsoft Word 10">
<link rel=File-List href="NW100_files/filelist.xml">
<title>Test test test</title>
<!--[if gte mso 9]><xml>
 <o:DocumentProperties>
  <o:Author>Liam Gibbs</o:Author>
  <o:LastAuthor>Liam Gibbs</o:LastAuthor>
  <o:Revision>1</o:Revision>
  <o:TotalTime>1</o:TotalTime>
  <o:Created>2002-08-30T18:09:00Z</o:Created>
  <o:LastSaved>2002-08-30T18:10:00Z</o:LastSaved>
  <o:Pages>1</o:Pages>
  <o:Words>13</o:Words>
  <o:Characters>79</o:Characters>
  <o:Company>SXIA</o:Company>
  <o:Lines>1</o:Lines>
  <o:Paragraphs>1</o:Paragraphs>
  <o:CharactersWithSpaces>91</o:CharactersWithSpaces>
  <o:Version>10.3501</o:Version>
 </o:DocumentProperties>
</xml><![endif]--><!--[if gte mso 9]><xml>
 <w:WordDocument>
  <w:SpellingState>Clean</w:SpellingState>
  <w:GrammarState>Clean</w:GrammarState>
  <w:Compatibility>
   <w:BreakWrappedTables/>
   <w:SnapToGridInCell/>
   <w:WrapTextWithPunct/>
   <w:UseAsianBreakRules/>
  </w:Compatibility>
  <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
 </w:WordDocument>
</xml><![endif]-->
<style>
<!--
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
        {mso-style-parent:"";
        margin:0cm;
        margin-bottom:.0001pt;
        mso-pagination:widow-orphan;
        font-size:12.0pt;
        font-family:"Times New Roman";
        mso-fareast-font-family:"Times New Roman";}
span.SpellE
        {mso-style-name:"";
        mso-spl-e:yes;}
@page Section1
        {size:612.0pt 792.0pt;
        margin:72.0pt 90.0pt 72.0pt 90.0pt;
        mso-header-margin:35.4pt;
        mso-footer-margin:35.4pt;
        mso-paper-source:0;}
div.Section1
        {page:Section1;}
-->
</style>
<!--[if gte mso 10]>
<style>
 /* Style Definitions */
 table.MsoNormalTable
        {mso-style-name:"Table Normal";
        mso-tstyle-rowband-size:0;
        mso-tstyle-colband-size:0;
        mso-style-noshow:yes;
        mso-style-parent:"";
        mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
        mso-para-margin:0cm;
        mso-para-margin-bottom:.0001pt;
        mso-pagination:widow-orphan;
        font-size:10.0pt;
        font-family:"Times New Roman";}
</style>
<![endif]-->
</head>

<body lang=EN-US style='tab-interval:36.0pt'>

<div class=Section1>

<p class=MsoNormal>Test <span class=SpellE>test</span> <span
class=SpellE>test</span></p>

<p class=MsoNormal align=center style='text-align:center'><span
class=SpellE>Fdjfkasdjfkla</span></p>

<p class=MsoNormal align=center style='text-align:center'><span
class=SpellE><b
style='mso-bidi-font-weight:normal'>Fdjkslafjdklaf</b></span></p>

<p class=MsoNormal style='text-align:justify'><o:p>&nbsp;</o:p></p>

<p class=MsoNormal style='text-align:justify'><span
class=SpellE>Fdasfdfasffasdfdaadfdfs</span></p>

<p class=MsoNormal style='text-align:justify'><span
class=SpellE>Dfsdfs</span></p>

<p class=MsoNormal style='text-align:justify'>Hi</p>

<p class=MsoNormal style='text-align:justify'><o:p>&nbsp;</o:p></p>

<p class=MsoNormal style='text-align:justify'><span
style='mso-tab-count:3'>                                    </span><span
class=SpellE>Jfdklas</span></p>

<p class=MsoNormal style='text-align:justify'><o:p>&nbsp;</o:p></p>

</div>

</body>

</html> 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to