Thanks for all of the links and info. If I run into this again I will
know what to do.

Until then I did find a Firefox extension called UnMHT that did a pretty
damn good job at rendering the original file from the MCC website.

-----Original Message-----
>From: Joseph Sinclair <plug-discuss...@stcaz.net>
>Sent: May 4, 2009 4:44 PM
>To: Main PLUG discussion list <plug-discuss@lists.plug.phoenix.az.us>
>Subject: Re: how to sanitize MS Word HTML output?
>
>Default save-as for many Outlook (full) installations is MHT, which is an 
>MS-specific MIME archive format, sort of.
>Usually, you can run the extracted source through a mime decoder to get a 
>message-plus-attachments output, and then pull the HTML doc from there.
>Once you have "clean" MSHTML (it's not HTML, it's a MS-specific XML format 
>that just looks close enough to HTML that browsers can figure it out in quirks 
>mode), then you can usually pass it through one of several "cleaner" apps 
>available.  All work to some extent, but none are perfect...  Tidy is probably 
>the most complete, but it can be a bit of a pain to get all the options to 
>what you want.
>
>Links:
>  Using HTML tidy from [http://tidy.sourceforge.net/] with the "word-2000" 
> configuration option set to "yes" will go to great lengths to remove
>  MS-Word garbage while doing all of it's other nifty cleanups of the HTML.
>Other options:
>  Quick cleaner written in C# (Requires Mono) 
> [http://www.codinghorror.com/blog/archives/000485.html]
>  Javascript-based cleaner 
> [http://ethilien.net/websoft/wordcleaner/cleaner.htm]
>  Service to clean docs, may store and retain documents, so don't use for 
> anything you care about [http://www.wordhtmlcleaner.co.uk/]
>  Another service, only for small documents [http://textism.com/wordcleaner/]
>



---------------------------------------------------
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss

Reply via email to