Hi Sorabh,
 
You need to get a CPF(Content Processing Framework) License key to
convert any MS Office doc or PDF doc to XML.
 
Then you can write your search query(check out cts:search api) based on
the schema (Normally it is docbook)of these XML.
 
Regards
Venkatesh M S
 
 

________________________________

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Sukhendra
Rai
Sent: Wednesday, June 13, 2007 5:52 PM
To: [email protected]
Cc: Sorabh Jerath
Subject: [MarkLogic Dev General] Require suggestions to load and search
worddocs



Hi,

 

I am familiarizing my self with Mark Logic Server and XQuery. 

I have to store (load) word documents in the server. 

I want to search these documents for particular keywords. 

 

I request for suggestions to find out the best way to load and search
these documents in MarkLogic Server.

 

Going through the developer guide chapter 11, I found three formats XML,
binary and text. I used xdmp:document-load to load the doc files. If I
try to use XML or text in <format> parameter of xdmp:document-load, a
error is generate stating that "my document is not in the UTF-8 format
while it works fine with binary format. In my opinion, word document
stored in the binary format can not be searched efficiently.
xdmp:document-load does not seems to be automatically converting the
document from any other type to XML format. Is there any function does
this?

 

I found the xdmp:word-convert
<file:///C:\Documents%20and%20Settings\sukhendra.rai\Desktop\markLogic\M
arkLogic_3.2_pubs\pubs\apidocs\Document-Conversion.html#word-convert>
function to convert the word document in XHTML format. If I need to
store the doc files in XHTML for better searching performance should I
need to first convert and then store them in the server?

 

Thanks,

Sukhendra Rai

 




The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email.
 
www.wipro.com
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to