Hi Danny, Thanks for the information. I understand that using enrich/entity-highlight, I can enrich the XML content by some entity types already defined in Marklogic. I would like to know how can I add more entity type and add the different values of the new entity types or customize the existing entity types e.g. Suppose my project is about newspaper. I need metadata about commodity (e.g. gas, electricity etc) , subject area, geography, news sources etc. So I need to insert my definition of entity type into the Marklogic. It will be great if you can let me know how can I do that?
Secondly does Marklogic has any plugin where it will take a DITA format XML as an input and will return the DITA format XML as output with enriched metadata? Because I may have XML that conforms to DITA format. Now entity-highlight or enrich function adds some tag in the content. These news tags may not conform to DITA XSD format. So I was wondering if there is any builtin function/plugin to work with the DITA xml for enriching content. I did not understand the concept of pipelines for third party technologies like Temis. I guess they are also accessing and invoking the same enrich function of Marklogic. How they can provide different functionality? regards, Saptarshi Das Tata Consultancy Services 20 Ryan Ranch Road Monterey Monterey - 93940,California United States Mailto: [email protected] Website: http://www.tcs.com ____________________________________________ Experience certainty. IT Services Business Solutions Outsourcing ____________________________________________ [email protected] Sent by: [email protected] 07/06/2009 02:44 PM Please respond to [email protected] To [email protected] cc Subject General Digest, Vol 61, Issue 10 Message: 2 Date: Mon, 6 Jul 2009 11:52:23 -0700 From: Danny Sokolsky <[email protected]> Subject: RE: [MarkLogic Dev General] Enrichment of content To: General Mark Logic Developer Discussion <[email protected]> Message-ID: <[email protected]> Content-Type: text/plain; charset="utf-8" Hi Saptarshi, If the XML from entity:enrich does not suit your needs, it sounds like you will need to use cts:entity-highlight to define the XML based on your own taxonomy: http://developer.marklogic.com/pubs/4.1/apidocs/SearchBuiltins.html#cts:entity-highlight You can use cts:entity-highlight to write a function to transform the entity mark up to return whatever you need. You can also see chapter 9 (~p109) of the Search Developerâ??s Guide: http://developer.marklogic.com/pubs/4.1/books/search-dev-guide.pdf It outlines how entity enrichment works with MarkLogic Server. I am not sure I understand your question about schema and DITA. Perhaps if you gave a specific example of what you are trying to do and what you are having trouble doing, we might be able to help you find a solution. The sample pipelines that use third-party technologies (such as Temis) are designed to show integration with these other technologies. Entity extraction technologies are often very specialized to particular types of content, and MarkLogic can work with a wide array of different technologies. Hope that helps, -Danny From: [email protected] [ mailto:[email protected]] On Behalf Of [email protected] Sent: Thursday, July 02, 2009 10:27 PM To: [email protected] Subject: [MarkLogic Dev General] Enrichment of content Hi, In my project, I shall be using Marklogic and we have a requirement for content enrichment. I have the content and a taxonomy structure defined. I want to enrich the content using that taxonomy structure. I would like to do the inline metadata tagging on the content. Following are my few questions: 1) From the enrich module API, I have understood that using the enrich function I can add the metadata on the given XML. Here it seems to me that the taxonomy structure and values based on which the metadata is tagged is managed by Marklogic. In my project, I have my own taxonomy definition for the marked up elements. I would like to use that taxonomy definition for enriching the content. How can I add that into Marklogic? 2) Secondly, I have noticed that if that XML has any schema defined and that does not allow children element, Marklogic does not enrich that node. That is fine. But if I send a DITA formatted XML, can I get a DITA formatted XML as output with the enriched content? It will be very helpful, if you can give some example on this topic. I also would like to explore more on this topic. If you can provide me some more resource that will be great. 3) I have also seen that Marklogic has partnered with Temis Luxid for content enrichment. I could not understand that what Marklogic is providing and what Temis is doing extra on top of Marklogic. Any help in this regard will be great. Thanks in advance. regards, Saptarshi Das Tata Consultancy Services United States Mailto: [email protected] Website: http://www.tcs.com ____________________________________________ Experience certainty.IT Services Business Solutions Outsourcing ____________________________________________ ------------------------------ _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general ForwardSourceID:NT00005B4E =====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you
_______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
