On Mon, Feb 20, 2012 at 10:40, Nadav Har'El <n...@math.technion.ac.il> wrote: > On Sun, Feb 19, 2012, Dotan Cohen wrote about "Re: Preparing to convince to > shift to non-propriety documents formats": >> Undocumented? Which file format is that? All the .doc and .docx >> formats are documented, even the older binary formats. > > Where is the ".doc" format documented? > > I once wrote a tool to extract the text in MS Office files (for a search > engine). It was a really annoying reverse-engineering-like > trial-and-error process, and I could hardly find any documentation. > The PowerPoint format (.ppt) was particularly odd. > > What documentation do you refer to? >
Here are the pre-2007 formats: http://msdn.microsoft.com/en-us/library/ff381461.aspx And here are the current versions: http://msdn.microsoft.com/en-us/library/cc313118.aspx -- Dotan Cohen http://gibberish.co.il http://what-is-what.com _______________________________________________ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il