Have you considered the use of PROV (http://www.w3.org/TR/2013/NOTE-prov-primer-20130430/) for this? You could for eg. put in your envelope on the canonical document some PROV-XML that indicates the provenance of this document, then use TDE to create PROV-O triples out of that. If you don’t have the source document URI at the time (e.g. you haven’t committed it yet), you could use a document ID that’s specific to your system (I’m sure all your documents have some sort of ID, right?), and then for each document use TDE to create a triple linking the document ID to its URI. You should then be able to use SPARQL to do whatever provenance queries you need.
From: general-boun...@developer.marklogic.com <general-boun...@developer.marklogic.com> On Behalf Of Florent Georges Sent: Friday, March 23, 2018 3:09 AM To: MarkLogic Developer Discussion <general@developer.marklogic.com> Subject: Re: [MarkLogic Dev General] Using metadata to store references to several other docs? Hi David, Thank you for your response! Short answers to your questions: "none" and "yes and no" :-) A bit of context, trying not to give too much details. We have several (technical) sources of data. We store these documents (often already processed WRT the original input). We build a "canonical model" out of these documents. Several documents, in the same source or in several sources, can contribute to the same "canonical entity". So there is no business meaning in that relationship. Business meaning is all captured in the canonical model, and it is the only layer meant to be consumed/queried by users. These links mean "I am one of the source of this entity", so we can easily find all sources when one of them is updated/created/deleted, to recreate the canonical entity. Whilst there must be a way to retrieve the canonical entity from a source document (given some ID or any other business or technical mean, including a more complex query involving other documents), there is not necessarily a way to retrieve all source docs from the entity. The list of possible sources (and their types and their document structures) will evolve over time, so we cannot make any assumption on them (besides it will be XML or JSON, we can always handle binary with one indirection). I guess the options are: - good ol' envelope pattern - TDE to expose several triples out of a "composite" metadata value - using several metadata (ref-1, ref-2, ref-3...) Have I forgot anything? Regards, -- Florent Georges H2O Consulting http://h2o.consulting/ On 22 March 2018 at 21:09, David Gorbet wrote: What is the actual business meaning of the relationships between the documents? And is there something in the document that indicates this relationship, just not with the doc URI? From: general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com> <general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>> On Behalf Of Florent Georges Sent: Thursday, March 22, 2018 1:06 PM To: MarkLogic Developer Discussion <general@developer.marklogic.com<mailto:general@developer.marklogic.com>> Subject: [MarkLogic Dev General] Using metadata to store references to several other docs? Hi, I need to store references from one document to another. The mechanism is generic and cannot be tied to a particular type of document. It is then difficult to store the reference inside the document. What looks like a perfect solution is to use a metadata with a specific name, the value of which is the URI of the target document. With a field on that metadata, I can search for the source of any target URI. But some documents can point to several URIs. And as far as I can tell, it is not possible to have several values for a given metadata (on the same document). Any idea how I can store such references without requiring modifying the content of the documents? As I would like the references to "live" with the source doc, I would like to avoid using managed triples. The closest I can think of is to use the metadata to store several URIs in one string, using a separator, and have TDE to expose as many triples as there are URIs. But I am not definitely sure I can access metadata in TDE (and loop over tokenize($value, '|') to create several triples). Am I missing anything obvious? Regards, -- Florent Georges H2O Consulting http://h2o.consulting/ _______________________________________________ General mailing list General@developer.marklogic.com<mailto:General@developer.marklogic.com> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general