Hi, Tim has a point with OpenOffice 2, but be aware that the beta version is buggy (I got tired of it bombing out on me and removed it until a more stable version is avaiable). In particular, I found it nearly impossible to open large files (I have lots of Excel pivot table files in the 50-300MB range and some large Word files with embedded data). Complex word files (graphics/tables/etc) would often come out "funny".
So if you use that kind of tool in batch, I would make sure I "twin" every XML version with the original Word file so that users easily can go back to the original if they find the converted version messed up. With thousands of files converted in batch mode, assume that some of them won't be looked at by a sober human for maybe 10 or 15 years. Best regards Calle > -----Original Message----- > From: Tim Churches [mailto:[EMAIL PROTECTED] > Sent: 16 March 2005 06:49 PM > To: openhealth-list@minoru-development.com > Subject: Re: M$oft Word to XML or HTML conversion > > Daniel L. Johnson wrote: > > Dear All, > > > > Anybody here know of a tool to convert MicroSoft Word files > to XML or > > HTML? We have a huge archive of Word files... > > What sort of XML? Ms-Word saves its documents as XML - but > the DTD used is proprietary. > > As Ignacio said, MS Word can save as HTML, but the resulting > HTML files are full of proprietary Microsoft extensions to > HTML. MS-Word 2002 and later offer a choice to safe as > "filtered HTML" which is a bit cleaner, but still horrible. > > The best way to convert MS-Word files to an open > standards-based XML format is to use a beta version of the > forthcoming OpenOffice 2.0 - see http://www.openoffice.org/ > The beta versions work fine, and will save to the OASIS > OpenDocument XML standards (see > http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office ). > Actualy, I think OpenOffice 1.1.4 also allows you to save to > OpenDocument format, but the OpenOffice 2.0 beta will do a > better job at importing complex MS-Word documents (especially > if they have nested tables). > > It should be easy to write a macro to automate the > conversion, or you can drive OpenOffice from a Python script > via PyUNO if you are keen. > > Tim C > > >