Hi Hari,

What Tidy options are you using? Given the amount of markup that gets chopped out of Word output, some reformatting of the source is necessary. But the results are fairly neutral to rendering in a browser.

If you use the "clean" or "drop-font-tags" options, almost all presentation data will be dropped. See


You will probably want to add a stylesheet to set fonts, etc. This you will need to do on your own. Sed works well on Tidy output, but is shaky on arbitrary markup.

Also, which version of Word and Tidy?

Charles Reitzel

At 03:51 PM 1/3/2003 -0800, Hari M wrote:

What is the best way to get MS Word to HTML?

I have a text box that users can use to enter information to upload to their website. Normally users copy and paste from MS Word. I use a WSIWIG rich text box editor that can except most of MS Word formats.

I tried using Tidy HTMl as an option to remove the clutter that Word inserts - but it messes up with the format.

Is the best option to convert MS Word to XML and then to HTML?

I posted a similar question earlier but it did not appear on the list - my appologies if this appears twice.




MSN 8: advanced junk mail protection and
2 months FREE*


--- StripMime Report -- processed MIME parts ---
text/html (html body -- converted)
a wish for peace in the new year.
a wish for peace in the new year.

Reply via email to