Have you considered the use of PROV 
(http://www.w3.org/TR/2013/NOTE-prov-primer-20130430/) for this? You could for 
eg. put in your envelope on the canonical document some PROV-XML that indicates 
the provenance of this document, then use TDE to create PROV-O triples out of 
that. If you don’t have the source document URI at the time (e.g. you haven’t 
committed it yet), you could use a document ID that’s specific to your system 
(I’m sure all your documents have some sort of ID, right?), and then for each 
document use TDE to create a triple linking the document ID to its URI. You 
should then be able to use SPARQL to do whatever provenance queries you need.

From: general-boun...@developer.marklogic.com 
<general-boun...@developer.marklogic.com> On Behalf Of Florent Georges
Sent: Friday, March 23, 2018 3:09 AM
To: MarkLogic Developer Discussion <general@developer.marklogic.com>
Subject: Re: [MarkLogic Dev General] Using metadata to store references to 
several other docs?

Hi David,
Thank you for your response!  Short answers to your questions: "none" and "yes 
and no" :-)
A bit of context, trying not to give too much details.  We have several 
(technical) sources of data.  We store these documents (often already processed 
WRT the original input).  We build a "canonical model" out of these documents.  
Several documents, in the same source or in several sources, can contribute to 
the same "canonical entity".
So there is no business meaning in that relationship.  Business meaning is all 
captured in the canonical model, and it is the only layer meant to be 
consumed/queried by users.  These links mean "I am one of the source of this 
entity", so we can easily find all sources when one of them is 
updated/created/deleted, to recreate the canonical entity.
Whilst there must be a way to retrieve the canonical entity from a source 
document (given some ID or any other business or technical mean, including a 
more complex query involving other documents), there is not necessarily a way 
to retrieve all source docs from the entity.
The list of possible sources (and their types and their document structures) 
will evolve over time, so we cannot make any assumption on them (besides it 
will be XML or JSON, we can always handle binary with one indirection).
I guess the options are:
- good ol' envelope pattern
- TDE to expose several triples out of a "composite" metadata value
- using several metadata (ref-1, ref-2, ref-3...)
Have I forgot anything?
Regards,

--
Florent Georges
H2O Consulting
http://h2o.consulting/


On 22 March 2018 at 21:09, David Gorbet wrote:
What is the actual business meaning of the relationships between the documents? 
And is there something in the document that indicates this relationship, just 
not with the doc URI?

From: 
general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>
 
<general-boun...@developer.marklogic.com<mailto:general-boun...@developer.marklogic.com>>
 On Behalf Of Florent Georges
Sent: Thursday, March 22, 2018 1:06 PM
To: MarkLogic Developer Discussion 
<general@developer.marklogic.com<mailto:general@developer.marklogic.com>>
Subject: [MarkLogic Dev General] Using metadata to store references to several 
other docs?

Hi,

I need to store references from one document to another.  The mechanism is 
generic and cannot be tied to a particular type of document.  It is then 
difficult to store the reference inside the document.

What looks like a perfect solution is to use a metadata with a specific name, 
the value of which is the URI of the target document.  With a field on that 
metadata, I can search for the source of any target URI.

But some documents can point to several URIs.  And as far as I can tell, it is 
not possible to have several values for a given metadata (on the same document).

Any idea how I can store such references without requiring modifying the 
content of the documents?

As I would like the references to "live" with the source doc, I would like to 
avoid using managed triples.

The closest I can think of is to use the metadata to store several URIs in one 
string, using a separator, and have TDE to expose as many triples as there are 
URIs.  But I am not definitely sure I can access metadata in TDE (and loop over 
tokenize($value, '|') to create several triples).

Am I missing anything obvious?

Regards,

--
Florent Georges
H2O Consulting
http://h2o.consulting/



_______________________________________________
General mailing list
General@developer.marklogic.com<mailto:General@developer.marklogic.com>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to