Re: Loading JSON-LD into ES

abo Sat, 27 Sep 2014 19:24:02 -0700

Thank you all for your responses and interesting conversation about RDF 
serialization into ES. With regards to my original post, I ended up using a 
solution based on RDFlib:


https://github.com/RDFLib/rdflib-jsonld

It works as expected, and compacting the content by using @context does the 
trick and is flexible. It is an in-memory process however, which could be 
an issue for those with very large RDF files. When using Jena, I didn't 
find the ability to add @context mappings, but maybe I didn't dig enough.

On a side note, looks like the rdflib-jsonld solution already has support 
for XSD literals and lists, so perhaps it could be extended to map directly 
into ES _type if that is a good direction.

With my Json-ld file ready for ingestion into ES, I do have another 
question: are there utilities to bulk load such documents (the json-ld 
contains individual documents per ES, each with an _id), or do I just write 
a script that calls curl -XPUT for each record in the json-ld file? Seems 
like a pretty common use case.

Thanks again to all, interesting stuff. Happy to contribute to extending an 
existing solution.

-- ab

On Saturday, September 27, 2014 9:24:24 AM UTC-7, Jörg Prante wrote:
>
> For the _mapping, I think about two more types for that I intend to write 
> ES type mappers, "iri" and "literal", so ES can receive XSD data types and 
> language codes and map them to fields / analyzers. IRIs are just opaque 
> strings but they can be shortened if prefix is configured and can be used 
> as _id or for referencing to an _id.
>
> Instead of _mapping I prefer the thought about handling @contexts like 
> template documents.
>
> Not sure about the best way to manage JSON-LD. There are two approaches: 
> save a JSON-LD (you say original document) beside other versions. This 
> requires more space and I'm not sure about the purpose of the original 
> JSON-LD. The other approach is more about dropping original JSON-LD after 
> parsing it to triples and store the triples in an ES JSON doc which is a 
> surrogate close to JSON-LD but arranges with all the JSON dialect 
> characteristics of the ES document DSL.
>
> I'm not in scala, so I can not promise much, but happy about glimpsing all 
> related code!
>
> Jörg
>
>
> On Sat, Sep 27, 2014 at 4:50 PM, Alfredo Serafini <ser...@gmail.com 
> <javascript:>> wrote:
>
>> HI Jorg Indeed! :-)
>>
>> What I like about _mapping is that they are managed as documents too, and 
>> they can be:
>>
>>    1. automatically inferred from data (at risk, but useful)
>>    2. provided by static files, in some cases
>>    3. managed for _index/_types
>>
>> all those things could be done with something like a _context (which will 
>> include at first a single @context). The first point should probably be 
>> avoided at all for json-ld :-), but it should be possible.
>>
>> But we may need more @context items for a single "resource" schema 
>> (referring to _index/_type), and in perspective it's even possible to 
>> re-use a @context for different _index/_type pairs.
>> Furthermore: when exposing results in jsonld one might want to reference 
>> an external @context and merge it before providing results, and In my 
>> opinion the more "risky" part is when input the original json-ld, if we 
>> want to flat it and extract the @context which will permits us to 
>> recostruct later the original document.
>> Given the fact that it could be possible to map every kind of json 
>> results from ES, documents imported as jsonld might has to maintain at 
>> least the original fields.
>>
>> I'd like to put some code on github and if you want we could join the 
>> effort on that? I'm working mostly on scala at the moment. What do you 
>> think about?
>>
>>
>>
>>
>> Il giorno venerdì 26 settembre 2014 20:32:52 UTC+2, Jörg Prante ha 
>> scritto:
>>>
>>> Absolutely. My thought is about managing one (or more) context ES JSON 
>>> document(s) where all the @context definitions of an index live. A format 
>>> plugin can then process search results and converts ES JSON to expanded 
>>> JSON-LD and from there to other RDF serializations.
>>>
>>> Jörg
>>>
>>> On Fri, Sep 26, 2014 at 6:23 PM, Alfredo Serafini <ser...@gmail.com> 
>>> wrote:
>>>
>>>> Hi 
>>>>
>>>> using json-ld is indeed rather simple, as it is JSON, and then it's 
>>>> even possible to index it as is.
>>>> I'm currently using ES for storing RDF documents in json-ld on a 
>>>> specific index: in that case one can simply use the uri as an _id, recover 
>>>> the full original format by _source, and use basic search capabilities on 
>>>> the index, if escaping / nesting it's not a big deal.
>>>>
>>>> However, in order to use resource with some more flexibility, I think 
>>>> the best would be index them as "flat" as possible, then use an ad-hoc 
>>>> @context on the ES json to obtain again the original json-ld. 
>>>> This would be my ideal usage at the moment: seems complex at first, but 
>>>> it's not, I'm currently experimenting in saving @context for a _type, 
>>>> obtaining let's say a sort of _context, similar to a _mapping, to 
>>>> reconstruct the original semantics. 
>>>> If someone likes the idea, I'd like to share thoughts on that :-)
>>>>
>>>>
>>>> Il giorno venerdì 26 settembre 2014 14:08:07 UTC+2, Jörg Prante ha 
>>>> scritto:
>>>>>
>>>>> Lukáš,
>>>>>
>>>>> of course you are right, RDF/XML looks complex and requires parsing. 
>>>>> The underlying principle of all RDF is a graph (or a series of triples in 
>>>>> form of subject/predicate/object, where the triple series is a 
>>>>> serialization of the graph), So the challenge is first the parsing of RDF 
>>>>> input, and second, constructing the model, and third, serializing the 
>>>>> model 
>>>>> to an ES-friendly input (here: JSON-LD, sort of). RDF ensures that there 
>>>>> is 
>>>>> a single model for all serializations.
>>>>>
>>>>> This technical perspective does not necessarily solve all challenges 
>>>>> that are inherent to the chosen data model. For example, nested resources 
>>>>> in RDF. It might be feasible to flatten nested resource by their 
>>>>> identifiers and generate one JSON after the other. Or it could be 
>>>>> feasible 
>>>>> to keep nested resources intact and wrap them into nested structures in a 
>>>>> single ES JSON object. 
>>>>>
>>>>> In my data model, I can map RDF subject IDs to ES doc IDs. Other data 
>>>>> models may prefer other approaches to select ES doc IDs.
>>>>>
>>>>> Jörg
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 26, 2014 at 10:11 AM, Lukáš Vlček <lukas...@gmail.com> 
>>>>> wrote:
>>>>>
>>>>>> Jörg,
>>>>>>
>>>>>> my concern is that RDF/XML allow to express one thing in several 
>>>>>> ways. For example, if you take FOAF specification then there are several 
>>>>>> ways how you can express that one Person knows other Person. One way it 
>>>>>> using reference IDs other way it using nested Person inside other 
>>>>>> Person. 
>>>>>> See [1] for examples. My understanding is that although both ways 
>>>>>> express 
>>>>>> exactly the same information they lead to different XML representation 
>>>>>> and 
>>>>>> thus to different JSON-LD. Not that you can push such data in ES but I 
>>>>>> wonder if you can then have any consistent way of querying such data.
>>>>>>
>>>>>> May be there is some way how you can preprocess XML document and 
>>>>>> convert all nested Persons to references (would require arbitrary ID 
>>>>>> construction?). Or something similar. Though I am not sure this would be 
>>>>>> generally applicable approach to any RDF data.
>>>>>>
>>>>>> [1] http://www.xml.com/pub/a/2004/02/04/foaf.html
>>>>>>
>>>>>> Regards,
>>>>>> Lukas
>>>>>>
>>>>>> On Fri, Sep 26, 2014 at 9:28 AM, joerg...@gmail.com <
>>>>>> joerg...@gmail.com> wrote:
>>>>>>
>>>>>>> JSON-LD is perfect for ES indexing, as long as you use the "compact" 
>>>>>>> form of representation. 
>>>>>>>
>>>>>>> http://www.w3.org/TR/json-ld-api/#compaction-algorithms
>>>>>>>
>>>>>>> Example: 
>>>>>>>
>>>>>>> https://github.com/lanthaler/JsonLD/blob/master/Test/Fixture
>>>>>>> s/sample-compacted.jsonld
>>>>>>>
>>>>>>> This means you should use short field names and shorten IRIs to a 
>>>>>>> prefix form. This gives a convenient mapping to ES field names (e.g. 
>>>>>>> "dc:title" or "dc:creator"). The '@' fields can also be indexed and 
>>>>>>> they do 
>>>>>>> not control anything special in ES (some @id may be mapped to ES _id 
>>>>>>> but 
>>>>>>> for nested structures this does not match)
>>>>>>>
>>>>>>> I use my own RDF API and transform RDF graphs (so not only JSON-LD 
>>>>>>> but also other formats like N-Triples and RDF/XML) into XContent using 
>>>>>>> this 
>>>>>>> method:
>>>>>>>
>>>>>>> https://github.com/xbib/xbib/blob/master/content/src/main/ja
>>>>>>> va/org/xbib/rdf/content/DefaultResourceContentBuilder.java
>>>>>>>
>>>>>>> I plan to extend this content building by interpreting rdf:type and 
>>>>>>> rdf:list etc. to generate correct ES JSON objects and arrays. There is 
>>>>>>> also 
>>>>>>> an amount of work left to do for the plethora of XSD types in RDF 
>>>>>>> literals 
>>>>>>> or for language tags.
>>>>>>>
>>>>>>> This will be subsumed into an RDF input/output plugin for an 
>>>>>>> ES-based Linked Data Platform 
>>>>>>>
>>>>>>> http://www.w3.org/TR/ldp/
>>>>>>>
>>>>>>> but there is no ETA yet.
>>>>>>>
>>>>>>> Jörg
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček <lukas...@gmail.com> 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I think you will have to preprocess documents on your side first 
>>>>>>>> and then push into ES individually (you can push in batch).
>>>>>>>>
>>>>>>>> As a side note, I would say json-ld is quite low level 
>>>>>>>> serialization od RDF data IMO not optimal for ES indexing. May be 
>>>>>>>> better 
>>>>>>>> would be to find some RDF-OOM tool and have your RDF documents mapped 
>>>>>>>> to 
>>>>>>>> Java POJOs and serialize POJOs into JSONs instead (you can use Jackson 
>>>>>>>> library for that for example). This will give you better control over 
>>>>>>>> whole 
>>>>>>>> RDF -> JSON conversion process.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Lukas
>>>>>>>>
>>>>>>>> On Thu, Sep 25, 2014 at 7:21 PM, abo <a...@datavolution.com> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I'm new to Elasticsearch, so forgive me if this is a basic 
>>>>>>>>> question or if it's in some documentation that I haven't read...
>>>>>>>>>
>>>>>>>>> I am trying to load a json-ld file into ES. The json-ld file was 
>>>>>>>>> generated from an RDF file, using Jena. The structure starts with:
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>>   "@graph" :
>>>>>>>>>
>>>>>>>>> followed by the individual "documents", each with:
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>>     "@id" :
>>>>>>>>>
>>>>>>>>> and a variable number of parameters in each.
>>>>>>>>>
>>>>>>>>> My question is how do I load this into ES and ensure that 
>>>>>>>>> documents are individually referenced (as opposed to the entire 
>>>>>>>>> json-ld 
>>>>>>>>> file)?
>>>>>>>>>
>>>>>>>>> Do I need to doctor this json-ld file further in order to load it?
>>>>>>>>>
>>>>>>>>> Thanks for your help.
>>>>>>>>>
>>>>>>>>> -- abo
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "elasticsearch" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb
>>>>>>>>> 1-4c50-96c4-8f586e1e0807%40googlegroups.com 
>>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>
>>>>>>>>
>>>>>>>>  -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "elasticsearch" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>>>> To view this discussion on the web visit 
>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%
>>>>>>>> 3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com 
>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>>  -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "elasticsearch" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZm
>>>>>>> Tcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com 
>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>  -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>>> msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45
>>>>>> X7EGTEyc2bw%40mail.gmail.com 
>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%
>>>> 40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1a4c20a6-a215-42b8-bf11-350b766b508a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Loading JSON-LD into ES

Reply via email to