Hi David, Thank you for your response, and sorry for taking so long to respond.
I initially wanted to keep the source association out of the canonical entities, so keep the staging areas truly hidden, except when ingestion is concerned. But you convinced me to try the other way around, as PROV does, and it proved to make a few problems easier to solve that way. So we have something PROV-compatible now (so most likely we'll use actual PROV triples), and yes, we use TDE to project the semantic layer on top of the canonical entities ;-) Thank you, -- Florent Georges H2O Consulting http://h2o.consulting/ On 23 March 2018 at 15:58, David Gorbet wrote: > Have you considered the use of PROV (http://www.w3.org/TR/2013/ > NOTE-prov-primer-20130430/) for this? You could for eg. put in your > envelope on the canonical document some PROV-XML that indicates the > provenance of this document, then use TDE to create PROV-O triples out of > that. If you don’t have the source document URI at the time (e.g. you > haven’t committed it yet), you could use a document ID that’s specific to > your system (I’m sure all your documents have some sort of ID, right?), and > then for each document use TDE to create a triple linking the document ID > to its URI. You should then be able to use SPARQL to do whatever provenance > queries you need. > > > > *From:* general-boun...@developer.marklogic.com < > general-boun...@developer.marklogic.com> *On Behalf Of *Florent Georges > *Sent:* Friday, March 23, 2018 3:09 AM > *To:* MarkLogic Developer Discussion <general@developer.marklogic.com> > *Subject:* Re: [MarkLogic Dev General] Using metadata to store references > to several other docs? > > > > Hi David, > > Thank you for your response! Short answers to your questions: "none" and > "yes and no" :-) > > A bit of context, trying not to give too much details. We have several > (technical) sources of data. We store these documents (often already > processed WRT the original input). We build a "canonical model" out of > these documents. Several documents, in the same source or in several > sources, can contribute to the same "canonical entity". > > So there is no business meaning in that relationship. Business meaning is > all captured in the canonical model, and it is the only layer meant to be > consumed/queried by users. These links mean "I am one of the source of > this entity", so we can easily find all sources when one of them is > updated/created/deleted, to recreate the canonical entity. > > Whilst there must be a way to retrieve the canonical entity from a source > document (given some ID or any other business or technical mean, including > a more complex query involving other documents), there is not necessarily a > way to retrieve all source docs from the entity. > > The list of possible sources (and their types and their document > structures) will evolve over time, so we cannot make any assumption on them > (besides it will be XML or JSON, we can always handle binary with one > indirection). > > I guess the options are: > > - good ol' envelope pattern > > - TDE to expose several triples out of a "composite" metadata value > > - using several metadata (ref-1, ref-2, ref-3...) > > Have I forgot anything? > > Regards, > > > -- > > Florent Georges > > H2O Consulting > > http://h2o.consulting/ > > > > > > On 22 March 2018 at 21:09, David Gorbet wrote: > > What is the actual business meaning of the relationships between the > documents? And is there something in the document that indicates this > relationship, just not with the doc URI? > > > > *From:* general-boun...@developer.marklogic.com < > general-boun...@developer.marklogic.com> *On Behalf Of *Florent Georges > *Sent:* Thursday, March 22, 2018 1:06 PM > *To:* MarkLogic Developer Discussion <general@developer.marklogic.com> > *Subject:* [MarkLogic Dev General] Using metadata to store references to > several other docs? > > > > Hi, > > > > I need to store references from one document to another. The mechanism is > generic and cannot be tied to a particular type of document. It is then > difficult to store the reference inside the document. > > > > What looks like a perfect solution is to use a metadata with a specific > name, the value of which is the URI of the target document. With a field > on that metadata, I can search for the source of any target URI. > > > > But some documents can point to several URIs. And as far as I can tell, > it is not possible to have several values for a given metadata (on the same > document). > > > > Any idea how I can store such references without requiring modifying the > content of the documents? > > > > As I would like the references to "live" with the source doc, I would like > to avoid using managed triples. > > > > The closest I can think of is to use the metadata to store several URIs in > one string, using a separator, and have TDE to expose as many triples as > there are URIs. But I am not definitely sure I can access metadata in TDE > (and loop over tokenize($value, '|') to create several triples). > > > > Am I missing anything obvious? > > > > Regards, > > > > -- > > Florent Georges > > H2O Consulting > > http://h2o.consulting/ > > > > > > > _______________________________________________ > General mailing list > General@developer.marklogic.com > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general > > > > _______________________________________________ > General mailing list > General@developer.marklogic.com > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general > >
_______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general