Thank you all for your responses and interesting conversation about RDF serialization into ES. With regards to my original post, I ended up using a solution based on RDFlib:
https://github.com/RDFLib/rdflib-jsonld It works as expected, and compacting the content by using @context does the trick and is flexible. It is an in-memory process however, which could be an issue for those with very large RDF files. When using Jena, I didn't find the ability to add @context mappings, but maybe I didn't dig enough. On a side note, looks like the rdflib-jsonld solution already has support for XSD literals and lists, so perhaps it could be extended to map directly into ES _type if that is a good direction. With my Json-ld file ready for ingestion into ES, I do have another question: are there utilities to bulk load such documents (the json-ld contains individual documents per ES, each with an _id), or do I just write a script that calls curl -XPUT for each record in the json-ld file? Seems like a pretty common use case. Thanks again to all, interesting stuff. Happy to contribute to extending an existing solution. Amine On Saturday, September 27, 2014 9:24:24 AM UTC-7, Jörg Prante wrote: > > For the _mapping, I think about two more types for that I intend to write > ES type mappers, "iri" and "literal", so ES can receive XSD data types and > language codes and map them to fields / analyzers. IRIs are just opaque > strings but they can be shortened if prefix is configured and can be used > as _id or for referencing to an _id. > > Instead of _mapping I prefer the thought about handling @contexts like > template documents. > > Not sure about the best way to manage JSON-LD. There are two approaches: > save a JSON-LD (you say original document) beside other versions. This > requires more space and I'm not sure about the purpose of the original > JSON-LD. The other approach is more about dropping original JSON-LD after > parsing it to triples and store the triples in an ES JSON doc which is a > surrogate close to JSON-LD but arranges with all the JSON dialect > characteristics of the ES document DSL. > > I'm not in scala, so I can not promise much, but happy about glimpsing all > related code! > > Jörg > > > On Sat, Sep 27, 2014 at 4:50 PM, Alfredo Serafini <ser...@gmail.com > <javascript:>> wrote: > >> HI Jorg Indeed! :-) >> >> What I like about _mapping is that they are managed as documents too, and >> they can be: >> >> 1. automatically inferred from data (at risk, but useful) >> 2. provided by static files, in some cases >> 3. managed for _index/_types >> >> all those things could be done with something like a _context (which will >> include at first a single @context). The first point should probably be >> avoided at all for json-ld :-), but it should be possible. >> >> But we may need more @context items for a single "resource" schema >> (referring to _index/_type), and in perspective it's even possible to >> re-use a @context for different _index/_type pairs. >> Furthermore: when exposing results in jsonld one might want to reference >> an external @context and merge it before providing results, and In my >> opinion the more "risky" part is when input the original json-ld, if we >> want to flat it and extract the @context which will permits us to >> recostruct later the original document. >> Given the fact that it could be possible to map every kind of json >> results from ES, documents imported as jsonld might has to maintain at >> least the original fields. >> >> I'd like to put some code on github and if you want we could join the >> effort on that? I'm working mostly on scala at the moment. What do you >> think about? >> >> >> >> >> Il giorno venerdì 26 settembre 2014 20:32:52 UTC+2, Jörg Prante ha >> scritto: >>> >>> Absolutely. My thought is about managing one (or more) context ES JSON >>> document(s) where all the @context definitions of an index live. A format >>> plugin can then process search results and converts ES JSON to expanded >>> JSON-LD and from there to other RDF serializations. >>> >>> Jörg >>> >>> On Fri, Sep 26, 2014 at 6:23 PM, Alfredo Serafini <ser...@gmail.com> >>> wrote: >>> >>>> Hi >>>> >>>> using json-ld is indeed rather simple, as it is JSON, and then it's >>>> even possible to index it as is. >>>> I'm currently using ES for storing RDF documents in json-ld on a >>>> specific index: in that case one can simply use the uri as an _id, recover >>>> the full original format by _source, and use basic search capabilities on >>>> the index, if escaping / nesting it's not a big deal. >>>> >>>> However, in order to use resource with some more flexibility, I think >>>> the best would be index them as "flat" as possible, then use an ad-hoc >>>> @context on the ES json to obtain again the original json-ld. >>>> This would be my ideal usage at the moment: seems complex at first, but >>>> it's not, I'm currently experimenting in saving @context for a _type, >>>> obtaining let's say a sort of _context, similar to a _mapping, to >>>> reconstruct the original semantics. >>>> If someone likes the idea, I'd like to share thoughts on that :-) >>>> >>>> >>>> Il giorno venerdì 26 settembre 2014 14:08:07 UTC+2, Jörg Prante ha >>>> scritto: >>>>> >>>>> Lukáš, >>>>> >>>>> of course you are right, RDF/XML looks complex and requires parsing. >>>>> The underlying principle of all RDF is a graph (or a series of triples in >>>>> form of subject/predicate/object, where the triple series is a >>>>> serialization of the graph), So the challenge is first the parsing of RDF >>>>> input, and second, constructing the model, and third, serializing the >>>>> model >>>>> to an ES-friendly input (here: JSON-LD, sort of). RDF ensures that there >>>>> is >>>>> a single model for all serializations. >>>>> >>>>> This technical perspective does not necessarily solve all challenges >>>>> that are inherent to the chosen data model. For example, nested resources >>>>> in RDF. It might be feasible to flatten nested resource by their >>>>> identifiers and generate one JSON after the other. Or it could be >>>>> feasible >>>>> to keep nested resources intact and wrap them into nested structures in a >>>>> single ES JSON object. >>>>> >>>>> In my data model, I can map RDF subject IDs to ES doc IDs. Other data >>>>> models may prefer other approaches to select ES doc IDs. >>>>> >>>>> Jörg >>>>> >>>>> >>>>> >>>>> On Fri, Sep 26, 2014 at 10:11 AM, Lukáš Vlček <lukas...@gmail.com> >>>>> wrote: >>>>> >>>>>> Jörg, >>>>>> >>>>>> my concern is that RDF/XML allow to express one thing in several >>>>>> ways. For example, if you take FOAF specification then there are several >>>>>> ways how you can express that one Person knows other Person. One way it >>>>>> using reference IDs other way it using nested Person inside other >>>>>> Person. >>>>>> See [1] for examples. My understanding is that although both ways >>>>>> express >>>>>> exactly the same information they lead to different XML representation >>>>>> and >>>>>> thus to different JSON-LD. Not that you can push such data in ES but I >>>>>> wonder if you can then have any consistent way of querying such data. >>>>>> >>>>>> May be there is some way how you can preprocess XML document and >>>>>> convert all nested Persons to references (would require arbitrary ID >>>>>> construction?). Or something similar. Though I am not sure this would be >>>>>> generally applicable approach to any RDF data. >>>>>> >>>>>> [1] http://www.xml.com/pub/a/2004/02/04/foaf.html >>>>>> >>>>>> Regards, >>>>>> Lukas >>>>>> >>>>>> On Fri, Sep 26, 2014 at 9:28 AM, joerg...@gmail.com < >>>>>> joerg...@gmail.com> wrote: >>>>>> >>>>>>> JSON-LD is perfect for ES indexing, as long as you use the "compact" >>>>>>> form of representation. >>>>>>> >>>>>>> http://www.w3.org/TR/json-ld-api/#compaction-algorithms >>>>>>> >>>>>>> Example: >>>>>>> >>>>>>> https://github.com/lanthaler/JsonLD/blob/master/Test/Fixture >>>>>>> s/sample-compacted.jsonld >>>>>>> >>>>>>> This means you should use short field names and shorten IRIs to a >>>>>>> prefix form. This gives a convenient mapping to ES field names (e.g. >>>>>>> "dc:title" or "dc:creator"). The '@' fields can also be indexed and >>>>>>> they do >>>>>>> not control anything special in ES (some @id may be mapped to ES _id >>>>>>> but >>>>>>> for nested structures this does not match) >>>>>>> >>>>>>> I use my own RDF API and transform RDF graphs (so not only JSON-LD >>>>>>> but also other formats like N-Triples and RDF/XML) into XContent using >>>>>>> this >>>>>>> method: >>>>>>> >>>>>>> https://github.com/xbib/xbib/blob/master/content/src/main/ja >>>>>>> va/org/xbib/rdf/content/DefaultResourceContentBuilder.java >>>>>>> >>>>>>> I plan to extend this content building by interpreting rdf:type and >>>>>>> rdf:list etc. to generate correct ES JSON objects and arrays. There is >>>>>>> also >>>>>>> an amount of work left to do for the plethora of XSD types in RDF >>>>>>> literals >>>>>>> or for language tags. >>>>>>> >>>>>>> This will be subsumed into an RDF input/output plugin for an >>>>>>> ES-based Linked Data Platform >>>>>>> >>>>>>> http://www.w3.org/TR/ldp/ >>>>>>> >>>>>>> but there is no ETA yet. >>>>>>> >>>>>>> Jörg >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček <lukas...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I think you will have to preprocess documents on your side first >>>>>>>> and then push into ES individually (you can push in batch). >>>>>>>> >>>>>>>> As a side note, I would say json-ld is quite low level >>>>>>>> serialization od RDF data IMO not optimal for ES indexing. May be >>>>>>>> better >>>>>>>> would be to find some RDF-OOM tool and have your RDF documents mapped >>>>>>>> to >>>>>>>> Java POJOs and serialize POJOs into JSONs instead (you can use Jackson >>>>>>>> library for that for example). This will give you better control over >>>>>>>> whole >>>>>>>> RDF -> JSON conversion process. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Lukas >>>>>>>> >>>>>>>> On Thu, Sep 25, 2014 at 7:21 PM, abo <a...@datavolution.com> wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I'm new to Elasticsearch, so forgive me if this is a basic >>>>>>>>> question or if it's in some documentation that I haven't read... >>>>>>>>> >>>>>>>>> I am trying to load a json-ld file into ES. The json-ld file was >>>>>>>>> generated from an RDF file, using Jena. The structure starts with: >>>>>>>>> >>>>>>>>> { >>>>>>>>> "@graph" : >>>>>>>>> >>>>>>>>> followed by the individual "documents", each with: >>>>>>>>> >>>>>>>>> { >>>>>>>>> "@id" : >>>>>>>>> >>>>>>>>> and a variable number of parameters in each. >>>>>>>>> >>>>>>>>> My question is how do I load this into ES and ensure that >>>>>>>>> documents are individually referenced (as opposed to the entire >>>>>>>>> json-ld >>>>>>>>> file)? >>>>>>>>> >>>>>>>>> Do I need to doctor this json-ld file further in order to load it? >>>>>>>>> >>>>>>>>> Thanks for your help. >>>>>>>>> >>>>>>>>> -- abo >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "elasticsearch" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to elasticsearc...@googlegroups.com. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb >>>>>>>>> 1-4c50-96c4-8f586e1e0807%40googlegroups.com >>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "elasticsearch" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to elasticsearc...@googlegroups.com. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5% >>>>>>>> 3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com >>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "elasticsearch" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to elasticsearc...@googlegroups.com. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZm >>>>>>> Tcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com >>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "elasticsearch" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to elasticsearc...@googlegroups.com. >>>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>>> msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45 >>>>>> X7EGTEyc2bw%40mail.gmail.com >>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to elasticsearc...@googlegroups.com. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3a8bf05b-ab48-43b8-8863-0a0ede739a32%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.