Hi Jakob,

I just sent you 2 files.

>From the Authoring App : Author/search/insert.xqy

And from the TK :  
MarkLogic/Modules/MarkLogic/openxml/word-processing-ml-support.xqy

The issue is this: Office 2010 requires namespace declarations for the root 
element of certain documents within the .docx (or OPC) package, even though 
those namespaces might not be used by any elements of the document.

Your test2.docx includes footer#.xml, footnotes.xml, and endnotes.xml parts.  
Those parts were all accounted for in OPC generation, but not the programmatic 
addition of their namespaces that 2010 requires.  We were already doing this 
for some files, but not these 3, so I added it.

Just an observation: Your test2.docx is actually a 2007 document, so I'm 
guessing you're working with an existing template. With the Addin and XQuery 
API, 2007 content can insert into 2010 and vice versa (to some extent on the 
latter, the XQuery API doesn't account for new 2010 features, but most 
organizations aren't going back and forth between flavors, ymmv)

With the updates provided, you should be in business!  I was able to generate 
OPC from the extracted test2.docx, roundtrip components inserting from 
searches, and even open the docs generated for 2010 in 2007.   But if you run 
into any issues, let me know.

>> And looking at it with some more care, I notice that it's not actually an 
>> XML DOM object as I thought it would be, but a string object with a "trim" 
>> function added (see screenshot). This may be intentional, but just in case 
>> ...

XML isn't a first class data type to Microsoft. When we feed Word XML , the 
Word object model actually requires the "XML" as a string. So, you'll find 
xdmp:quote/xdmp:unquote in the TK API and applications.

>> PS: Maybe at the end of it all you tell me why the add-in is called "Oslo 
>> Information Panel". :)

Oslo's my dog. You'll only ever see that when you run the app in IE. Within the 
Addin it's not visible.  But you can always change/remove it.  Oslo makes his 
way into all my code at some point. :)

Have a good weekend,
Pete



-----Original Message-----
From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Jakob Fix
Sent: Friday, February 24, 2012 1:51 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Word add-in and Word 2010

Pete,

I tried to recreate the test scenario like so:

* I created a test2.docx (Blank document template), with a "section"
content control, a "requirement" content control and a disclaimer boilerplate. 
Please find it attached.
* I then saved this document (no fancy "save as", just "save") on my local disk 
under the name test2.docx.
* I then copied it to the WebDAV folder where it got promptly unzipped and 
indexed.
* I then created a new document from scratch (Blank document template).
* In the MarkLogic pane I searched for "test" (because this word appears on 
both content controls), and got two results.
* I clicked on the "Insert" button next to a search result and encountered the 
aforementioned error about not being able to insert at this point.

I also attach the extracted file as it appears in MarkLogic, i.e. the _parts 
directory structure, for comparison, to be sure the pipelines did what we asked 
them to.

Mmmh ....... I finally also attach what appears to be the "xml"
property of the "pkgxml" Javascript object. And looking at it with some more 
care, I notice that it's not actually an XML DOM object as I thought it would 
be, but a string object with a "trim" function added (see screenshot). This may 
be intentional, but just in case ...


cheers,
Jakob.

PS: Maybe at the end of it all you tell me why the add-in is called "Oslo 
Information Panel". :)


On Fri, Feb 24, 2012 at 17:43, Pete Aven <pete.a...@marklogic.com> wrote:
> Hi Jakob,
>
> I received the .zip.  But can you please send me the source .docx this 
> test.xml is generated from?  Something else is going on here so I'd like to 
> test to try and recreate your issue.
>
> Also, I'm assuming this is the docx that you've saved to MarkLogic and that 
> you are trying to use as a search hit for insert into another document (for 
> testing purposes)?
>
> This test.xml document opens in Word in compatibility mode in 2010 (which 
> signifies a 2007 doc.)  If you roundtrip a 2010 doc, this shouldn't be the 
> case if you're using the latest .xqy; at least in my testing.  Also, simple 
> docs for me don't generate footnotes.  The code in 
> word-processing-ml-support.xqy accounts for the footnotes.xml part in OPC 
> generation, but, maybe you've discovered a bug.
>
>>>. Inside the /word/document.xml part, contents is simplified, for
>>>example, an element for signalling a spelling error has been removed,
>>>but otherwise it looks very much the same
>
> There's the XML Office will consume, and there's the XML it will produce.  We 
> aim to keep it simple when working with the formats and provide Office the 
> minimum XML for ingest to still get the desired results for the author in the 
> active document as well as for the next time the document is saved in Office.
>
>>>. The function ooxml:get-directory-package in the latest 
>>>word-processing-ml-support.xqy seems to take the different components in the 
>>>order as returned by cts:directory-query which makes me think that order is 
>>>not important.
>
> Order does not matter.
>
>>>* I did not have the WordprocessingML Process pipeline activated.
> However, once activated the insertion still didn't succeed. (The description 
> of this pipeline indicates that it's about merging similar runs. I did notice 
> when comparing the XML, that w:r elements were merged, so I'd guess that 
> works).
>
> In case you're interested, the pipeline solves this problem:
> http://community.marklogic.com/blog/smallchanges/2007-12-18
>
> It really shouldn't matter if its activated, but it was a guess as to 
> potentially what XML might  be getting tripped up during OPC generation 
> without knowing what your docs looked like.
>
> Thanks again,
> Pete
>
>
>
> -----Original Message-----
> From: general-boun...@developer.marklogic.com
> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Jakob
> Fix
> Sent: Friday, February 24, 2012 5:58 AM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Word add-in and Word 2010
>
> Pete,
>
> thanks for your replies, I very much appreciate them.
>
> I saved the package XML that is about to be inserted in the current Word 
> document via the Developer Tools in IE9 and tried to open it in Word (before 
> doing that I added the processing instruction so that Windows launches Word 
> instead of my XML editor).  This failed and Word tells me why:
>
> ---
> The file test.xml cannot be opened because there are problems with the 
> contents.
>
> Details
>
> The XML data is invalid according to the schema.
>
> Location: Part: /word/footnotes.xml, Line: 3, Column: 191
> ---
>
> It then goes on to suggest to attempt to recover its contents and succeeds.
>
> So it quite clearly says the document is not valid according to its schema. I 
> compared a basic OPC file created by Word with the one generated by MarkLogic 
> (although one cannot expect them to be same as the generated one should only 
> contain the part of a document in the database that has a search hit), and 
> the main differences seem to be the order of the pkg:parts. Inside the 
> /word/document.xml part, contents is simplified, for example, an element for 
> signalling a spelling error has been removed, but otherwise it looks very 
> much the same. The function ooxml:get-directory-package in the latest 
> word-processing-ml-support.xqy seems to take the different components in the 
> order as returned by cts:directory-query which makes me think that order is 
> not important. But I don't have a schema handy to validate it.
>
>
> Regarding the checks:
> * documents are saved OK via WebDAV. I can open them directly from Word, and 
> as you mentioned hits are found. The extraction pipelines are also executed 
> as the _parts directory is created.
> * I'm using MarkLogic 5.0-2
> * I had installed the latest version of the word-processing-ml-support.xqy in 
> Modules/MarkLogic/openxml.
> * I did not have the WordprocessingML Process pipeline activated.
> However, once activated the insertion still didn't succeed. (The description 
> of this pipeline indicates that it's about merging similar runs. I did notice 
> when comparing the XML, that w:r elements were merged, so I'd guess that 
> works).
>
> So, in summary, the package XML retrieved from MarkLogic contains the 
> different parts in a different order than how Word creates them.
> Otherwise I cannot see the differences. For information, I added the XML as a 
> zip file to this mail. If it doesn't make it through to the list, I'll send 
> it to you off-list.
>
> cheers,
> Jakob.
>
>
>
> On Thu, Feb 23, 2012 at 19:24, Pete Aven <pete.a...@marklogic.com> wrote:
>> Jakob!
>>
>>>>These documents are DOCX and were created by me when playing around with 
>>>>the tool kit and saved directly to MarkLogic via WebDAV. Now, given that 
>>>>the error message is the same as above and it was inappropriate there, I 
>>>>wonder what the reason might be here.
>>
>> The error message usually indicates that Word doesn't like the XML you are 
>> trying to insert.  So, something maybe wrong with the XML created for insert.
>>
>> Things to check:
>>
>> 1) You can validate the documents are indeed saved to ML after using
>> WebDAV
>>        WebDAV often does not work properly, especially on windows. I'm 
>> assuming it works as you get search results, but, just in case.
>>
>> 2)  Office Open XML Extract and WordprocessingML Process pipelines
>> are enabled
>>        I know the former is, but the latter?
>>        For WordprocessingML Process, which version of the server are you 
>> using?  5.0 supports the 2010 format, but previous versions do not.  Let me 
>> know if you are using an earlier version and I can forward the appropriate 
>> files (there are 2 .xqy, they're small).
>>
>> 3) Did you copy over the latest version of word-processing-ml-support.xqy 
>> that I sent you in the .zip to <server-root>/ Modules/MarkLogic/openxml ?
>>        This latest copy has support for the 2010 flavor of WordprocessingML, 
>> where the one downloadable from Community does not.
>>
>>>> Interestingly enough, I'm not getting any results for words appearing in 
>>>> the boilerplate documents, are they excluded from the search?
>>
>> When you enrich a document in Word, it adds what are called 'Content 
>> Controls' around the selected sections within the Word application.  In the 
>> XML, these manifest themselves as Structured Document Tags; w:sdt elements.
>>
>> Searches are performed against any text found within child elements of w:sdt.
>>
>> When you insert, the search hit (the w:sdt from the source document)  is 
>> formatted using the XQuery API into a Word document in the OPC format. (at 
>> least, it should be when everything is working correctly.)  This OPC 
>> document is then inserted into the active document through 
>> MLA.insertWordOpenXML().
>>
>> The boilerplates probably don't have any content within w:sdt tags and 
>> therefore are not showing up in searches.
>>
>> You can of course change the search by modifying
>> Author/search/search.xqy, but let's not go there til we sort out
>> insert. :)
>>
>> Hope this helps,
>> Pete
>>
>> -----Original Message-----
>> From: general-boun...@developer.marklogic.com
>> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Jakob
>> Fix
>> Sent: Thursday, February 23, 2012 12:15 PM
>> To: MarkLogic Developer Discussion
>> Subject: Re: [MarkLogic Dev General] Word add-in and Word 2010
>>
>> Hi Pete,
>>
>> OK, I grok the boilerplate functionality now (I somehow expected the files 
>> to already exist behind the buttons). The error message about "XML markup 
>> [that] cannot be inserted in the specified location" was kind of misleading. 
>> But that's cool.  I've created a couple of documents with different styles 
>> and they are maintained on insert, which is what you would expect when you 
>> know it's actually the OPC XML that's being copied and pasted, but still 
>> nice.
>>
>> We're making progress, thanks a lot. :)
>>
>> Next up is search: My search finds hits in docx documents right now, and the 
>> debug alert about the contents of the XML about to be inserted shows OPC 
>> XML, here's a bit:
>>
>> PACKAGE XML IS <pkg:package
>> xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage";><pkg:
>> p art pkg:name="/word/glossary/fontTable.xml"
>> pkg:contentType="application/vnd.openxmlformats-officedocument.wordpr
>> o cessingml.fontTable+xml"><pkg:xmlData><w:fonts
>> mc:Ignorable="w14"
>> xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships";
>> xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"; .
>>
>> These documents are DOCX and were created by me when playing around with the 
>> tool kit and saved directly to MarkLogic via WebDAV. Now, given that the 
>> error message is the same as above and it was inappropriate there, I wonder 
>> what the reason might be here. Clearly, this is not about the cursor being 
>> at the wrong position. By the way, the "Open" button for each search result 
>> works fine and opens the document as expected.  One of the search results is 
>> a "Section" the other one a "Policy".
>>
>> Interestingly enough, I'm not getting any results for words appearing in the 
>> boilerplate documents, are they excluded from the search?
>>
>> cheers,
>> Jakob.
>>
>>
>>
>> On Thu, Feb 23, 2012 at 14:54, Pete Aven <pete.a...@marklogic.com> wrote:
>>> Hi Jakob,
>>>
>>> Are you trying to insert from the boilerplate tab, from a search hit, or 
>>> both?
>>>
>>> To test boilerplate: save a document as XML from Word. (just as XML, not 
>>> 2003 XML), save this to the database, and reference it in the config file 
>>> found at Author/config/boilerplate.xml.
>>>
>>> Documents saved as XML are saved in what Microsoft calls OPC format. See 
>>> http://community.marklogic.com/blog/smallchanges/2009-01-08 for more 
>>> details.
>>>
>>> Then restart Word, place your cursor somewhere in the document, goto the 
>>> boilerplate tab in the application, and click the button for the 
>>> boilerplate you just added.
>>>
>>> You'll see that the code for boilerplate insert fetches the document from 
>>> the Server and passes it to insertWordOpenXML() which inserts it at the 
>>> current cursor location.  If this works, we're on the right track.
>>>
>>> The insert function from the button on a search hit, takes a component 
>>> found in a search ( a component being anything previously enriched from the 
>>> enrich tab in the Authoring application and saved to MarkLogic  ), and uses 
>>> the XQuery API to format it as OPC, before inserting into the doc using the 
>>> insertWordOpenXML() function.
>>>
>>> Are you starting with existing docs?  Or docs from SharePoint?
>>> These may have XML elements we haven't seen yet that aren't
>>> accounted for in the XQuery API and so may cause an issue. You may
>>> want to start by Authoring new docs to test the functionality, then
>>> hammer it with your existing docs to break it. :)
>>>
>>> Hope this helps,
>>> Pete
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: general-boun...@developer.marklogic.com
>>> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Jakob
>>> Fix
>>> Sent: Thursday, February 23, 2012 8:39 AM
>>> To: MarkLogic Developer Discussion
>>> Subject: Re: [MarkLogic Dev General] Word add-in and Word 2010
>>>
>>> After having installed the Author sample application which is really
>>> rather cool, the insertion into the current document still doesn't
>>> work.  While trying to understand, I noticed that the insert
>>> functionality expects the URI lexicon to be enabled which wasn't
>>> mentioned in the documentation but enabling that got me one step
>>> further.  Now it seems that the items found cannot not be inserted
>>> anywhere in the current document. By the way, I stuck with the
>>> defaults right now (i.e. Policies, Sections, Recommendations).
>>> That's the error message: ERROR error: XML markup cannot be inserted
>>> in the specified location. which I was able to track down to this
>>> function in
>>> MarkLogicWordAddin:
>>>
>>> line 1368 MLA.insertWordOpenXML = function(opc_xml)
>>>
>>> and more particularly that line:
>>>
>>> line 1381 window.external.insertWordOpenXML(v_docx);
>>>
>>> Glad for any ideas
>>>
>>> cheers,
>>> Jakob.
>>>
>>>
>>>
>>> On Wed, Feb 22, 2012 at 17:59, Jakob Fix <jakob....@gmail.com> wrote:
>>>> Thanks Pete,
>>>>
>>>> That's extra quick! :)
>>>> I got the zip. and am updating the msi as we speak.
>>>>
>>>> cheers,
>>>> Jakob.
>>>>
>>>>
>>>>
>>>> On Wed, Feb 22, 2012 at 17:45, Pete Aven <pete.a...@marklogic.com> wrote:
>>>>> Hi Jakob,
>>>>>
>>>>>>>1) this add-in is supported for Word 2010
>>>>>
>>>>> Though the Addin will install with Office 2010; the XQuery API with the 
>>>>> Toolkit that is currently available on the Community site is only 
>>>>> compatible with the 2007 flavor of WordprocessingML.
>>>>>
>>>>> The TK has been updated for 2010 support and is currently sitting in a 
>>>>> repository where I'm told it will be released onto the unsuspecting, 
>>>>> Office 2010-hungry masses at some point in the future.  Until then, I've 
>>>>> sent you a snapshot of the latest TK to your gmail.
>>>>>
>>>>>>>2) if so how can one debug this Javascript code (is there a Firebug-like 
>>>>>>>tool for this?).
>>>>>
>>>>> Unfortunately, not really.  Develop for the Addin application everything 
>>>>> you can outside of the context of the Addin (In IE).  You can use IE8 
>>>>> which has developer tools which are similar to firebug.  Once the 
>>>>> application is in the Addin however and calling the MLA functions, your 
>>>>> only real option is to use alert()s (or write logs to the filesystem, 
>>>>> which you can do with JavaScript in IE).
>>>>>
>>>>>>> MLA.insertBlockContent(response.responseXML);
>>>>>
>>>>> This function really should be deprecated.  Instead of the simple
>>>>> Sample, I'd suggest using the Sample Authoring App to
>>>>> enrich/insert content, and taking a look at the function 
>>>>> MLA.insertWordOpenXML().
>>>>> Once you grok this function, you will keep Word in a headlock and
>>>>> pretty much have your way with it. :)
>>>>>
>>>>> Hope this helps,
>>>>> Pete
>>>>>
>>>>> -----Original Message-----
>>>>> From: general-boun...@developer.marklogic.com
>>>>> [mailto:general-boun...@developer.marklogic.com] On Behalf Of
>>>>> Jakob Fix
>>>>> Sent: Wednesday, February 22, 2012 11:24 AM
>>>>> To: General Mark Logic Developer Discussion
>>>>> Subject: [MarkLogic Dev General] Word add-in and Word 2010
>>>>>
>>>>> Hello again,
>>>>>
>>>>> I've installed successfully the Word add-in and am able to search using 
>>>>> the sample provided in the download.
>>>>>
>>>>> However, the double-click on a found paragraph does not insert it
>>>>> into the currently open word document. Probably, things have
>>>>> changed from
>>>>> 2007 to 2010.  Looking at the Javascript code in Samples/search/search.js 
>>>>> I find this line:
>>>>>
>>>>> MLA.insertBlockContent(response.responseXML);
>>>>>
>>>>> which seems to be responsible for the insertion of the paragraph.
>>>>>
>>>>> So, I guess my question is whether 1) this add-in is supported for Word 
>>>>> 2010 and 2) if so how can one debug this Javascript code (is there a 
>>>>> Firebug-like tool for this?).
>>>>>
>>>>> cheers,
>>>>> Jakob.
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> General@developer.marklogic.com
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> General@developer.marklogic.com
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> General@developer.marklogic.com
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> General@developer.marklogic.com
>>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> General@developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> General@developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to