Is there doco on the guts of Word documents? It has 'magic number start, but I never needed past that.
On Tue, May 31, 2011 at 7:55 PM, Ian Thomas <il.tho...@iinet.net.au> wrote: > An explanation I received is as follows: > > > > The ProtectedForForms property is set by default but (is) only relevant if > the *document* ProtectionType is wdProtectionType.wdAllowOnlyFormFields. If > the ProtectedForForms property has ever been explicitly set on a section, it > will remain set after the protection is removed from the document but (it > will) be meaningless. > > > > I’m not sure if this applies to Word 2003 (Office 11.0) only, but I suspect > that is so. > ------------------------------ > > Ian Thomas > Victoria Park, Western Australia > ------------------------------ > > *From:* ozdotnet-boun...@ozdotnet.com [mailto: > ozdotnet-boun...@ozdotnet.com] *On Behalf Of *Ian Thomas > *Sent:* Tuesday, May 31, 2011 12:09 PM > > *To:* 'ozDotNet' > *Subject:* RE: Word VSTO question > > > > Hi James – thanks for the suggestion. The docx package and API is certainly > a lot more explicit. > > I won’t have a need to create docx documents, but (see later) the security > level set by VSTO for different Word versions may be a consideration. > > I’ve got over the practical problem, but I would like to dispel some of my > ignorance of the Word object model and understand it a little more, before > going further into a VSTO solution. > > The documents are public data, and were created between 2008 and 2010, are > all .doc, and can be opened with Word 2003. When opening any of these > documents, they have a macro code file (protected), but no document password > protection. > > When I open them, I see just a Section Break for the “not-protected” > documents - > > For the docs with a protected section, the Section Break is displayed like > this – > > However, identifying which documents have a protected Section isn’t > relevant, I have found. > > Using “Ask Cindy” (Cindy Meister, Word MVP and moderator on one of the Word > forums) last night, I was alerted to the fact that the protected section *can > be copied*, so that jumps over the problem of identifying protected > sections. I can just ignore that, and parse the text data within my > application. > > But I’m curious to understand why documents which have one Section > protected, and those that have nothing protected, both show Section #1 with > its .ProtectedForForms property True. > > I’m guessing that this is due to a Macro that is in *every* document. On > (manually) opening with Word 2003, if I “Disable Macros I can see a > (password-protected) Macro “autoopen” in what I had called the “unprotected” > documents, but if I “Enable Macros” then I see the “End of Protected > Section” adornment on the Section Break (second pic, above). > > For the “protected” documents (Section 1 showing “End of Protected Section” > whichever security mode is chosen), the Tools>Macros>*Macros (Alt-F8)*menu > choice is greyed. > > So I suspect that the explanation is that I need to explicitly set the > security level for opening Word docs in my code – ie, by default it must be > Low (whereas for testing by manually opening the docs in the installed Word > 2003, I have it set to Medium). If the security is LOW then the code would > detect Section #1 as a protected Section. > > I am using the Word 11.0 interop currently, and I'm wondering if the > "security setting" (?) is more rigorous by default in Word 12.0 and 14.0. > ------------------------------ > > Ian Thomas > Victoria Park, Western Australia > ------------------------------ > > *From:* ozdotnet-boun...@ozdotnet.com [mailto: > ozdotnet-boun...@ozdotnet.com] *On Behalf Of *James Chapman-Smith > *Sent:* Tuesday, May 31, 2011 10:29 AM > *To:* ozDotNet > *Subject:* RE: Word VSTO question > > > > Hi Ian, > > > > I couldn’t see a difference in the file format for protected or > non-protected documents. I got “Microsoft Word 97-2003 Document” for `.doc` > and “Microsoft Word Document” for `.docx` though. Is what you’re seeing > based on the file extension or definitely on the protection status? > > > > Assuming that you can’t tell without opening the files, here’s what I’d do. > > > > Using a machine with Word 2007 or 2010 on it, I would use VSTO to run > through each of the 50,000+ documents and convert all `.doc` format files to > `.docx` (in a temporary folder, of course) and then use > `System.IO.Packaging` to open each file and look at the ` > ~\word\settings.xml` stream within the file and see if it contains a > `<w:documentProtection />` node (or similar). > > > > Would that work for you? > > > > Cheers. > > > > James. > > > > *From:* ozdotnet-boun...@ozdotnet.com [mailto: > ozdotnet-boun...@ozdotnet.com] *On Behalf Of *Ian Thomas > *Sent:* Monday, 30 May 2011 22:13 > *To:* gl...@esbconsult.com; 'ozDotNet' > *Subject:* Word VSTO question > > > > > > I have 50000+ short Word documents, a proportion of which have a small > protected section. As a first pass, I need to identify which of the files > have a protected section. Can anyone help me with how to do that? > > On the basis of a sample of one of each, the Word file format is “Microsoft > Word 97-2003 Document” for the files without a protected section, and > “Microsoft Word Document” for those that do have a protected section. (the > machine I inspected these with has only Office 2003 installed). > ------------------------------ > > Ian Thomas > Victoria Park, Western Australia > -- Meski "Going to Starbucks for coffee is like going to prison for sex. Sure, you'll get it, but it's going to be rough" - Adam Hills
<<image001.jpg>>
<<image002.jpg>>