Thanks for all of the links and info. If I run into this again I will know what to do.
Until then I did find a Firefox extension called UnMHT that did a pretty damn good job at rendering the original file from the MCC website. -----Original Message----- >From: Joseph Sinclair <plug-discuss...@stcaz.net> >Sent: May 4, 2009 4:44 PM >To: Main PLUG discussion list <plug-discuss@lists.plug.phoenix.az.us> >Subject: Re: how to sanitize MS Word HTML output? > >Default save-as for many Outlook (full) installations is MHT, which is an >MS-specific MIME archive format, sort of. >Usually, you can run the extracted source through a mime decoder to get a >message-plus-attachments output, and then pull the HTML doc from there. >Once you have "clean" MSHTML (it's not HTML, it's a MS-specific XML format >that just looks close enough to HTML that browsers can figure it out in quirks >mode), then you can usually pass it through one of several "cleaner" apps >available. All work to some extent, but none are perfect... Tidy is probably >the most complete, but it can be a bit of a pain to get all the options to >what you want. > >Links: > Using HTML tidy from [http://tidy.sourceforge.net/] with the "word-2000" > configuration option set to "yes" will go to great lengths to remove > MS-Word garbage while doing all of it's other nifty cleanups of the HTML. >Other options: > Quick cleaner written in C# (Requires Mono) > [http://www.codinghorror.com/blog/archives/000485.html] > Javascript-based cleaner > [http://ethilien.net/websoft/wordcleaner/cleaner.htm] > Service to clean docs, may store and retain documents, so don't use for > anything you care about [http://www.wordhtmlcleaner.co.uk/] > Another service, only for small documents [http://textism.com/wordcleaner/] > --------------------------------------------------- PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us To subscribe, unsubscribe, or to change your mail settings: http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss