Re: Loading JSON-LD into ES

joergpra...@gmail.com Sat, 27 Sep 2014 09:24:48 -0700

For the _mapping, I think about two more types for that I intend to write
ES type mappers, "iri" and "literal", so ES can receive XSD data types and
language codes and map them to fields / analyzers. IRIs are just opaque
strings but they can be shortened if prefix is configured and can be used
as _id or for referencing to an _id.


Instead of _mapping I prefer the thought about handling @contexts like
template documents.

Not sure about the best way to manage JSON-LD. There are two approaches:
save a JSON-LD (you say original document) beside other versions. This
requires more space and I'm not sure about the purpose of the original
JSON-LD. The other approach is more about dropping original JSON-LD after
parsing it to triples and store the triples in an ES JSON doc which is a
surrogate close to JSON-LD but arranges with all the JSON dialect
characteristics of the ES document DSL.

I'm not in scala, so I can not promise much, but happy about glimpsing all
related code!

Jörg


On Sat, Sep 27, 2014 at 4:50 PM, Alfredo Serafini <ser...@gmail.com> wrote:

> HI Jorg Indeed! :-)
>
> What I like about _mapping is that they are managed as documents too, and
> they can be:
>
>    1. automatically inferred from data (at risk, but useful)
>    2. provided by static files, in some cases
>    3. managed for _index/_types
>
> all those things could be done with something like a _context (which will
> include at first a single @context). The first point should probably be
> avoided at all for json-ld :-), but it should be possible.
>
> But we may need more @context items for a single "resource" schema
> (referring to _index/_type), and in perspective it's even possible to
> re-use a @context for different _index/_type pairs.
> Furthermore: when exposing results in jsonld one might want to reference
> an external @context and merge it before providing results, and In my
> opinion the more "risky" part is when input the original json-ld, if we
> want to flat it and extract the @context which will permits us to
> recostruct later the original document.
> Given the fact that it could be possible to map every kind of json results
> from ES, documents imported as jsonld might has to maintain at least the
> original fields.
>
> I'd like to put some code on github and if you want we could join the
> effort on that? I'm working mostly on scala at the moment. What do you
> think about?
>
>
>
>
> Il giorno venerdì 26 settembre 2014 20:32:52 UTC+2, Jörg Prante ha scritto:
>>
>> Absolutely. My thought is about managing one (or more) context ES JSON
>> document(s) where all the @context definitions of an index live. A format
>> plugin can then process search results and converts ES JSON to expanded
>> JSON-LD and from there to other RDF serializations.
>>
>> Jörg
>>
>> On Fri, Sep 26, 2014 at 6:23 PM, Alfredo Serafini <ser...@gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> using json-ld is indeed rather simple, as it is JSON, and then it's even
>>> possible to index it as is.
>>> I'm currently using ES for storing RDF documents in json-ld on a
>>> specific index: in that case one can simply use the uri as an _id, recover
>>> the full original format by _source, and use basic search capabilities on
>>> the index, if escaping / nesting it's not a big deal.
>>>
>>> However, in order to use resource with some more flexibility, I think
>>> the best would be index them as "flat" as possible, then use an ad-hoc
>>> @context on the ES json to obtain again the original json-ld.
>>> This would be my ideal usage at the moment: seems complex at first, but
>>> it's not, I'm currently experimenting in saving @context for a _type,
>>> obtaining let's say a sort of _context, similar to a _mapping, to
>>> reconstruct the original semantics.
>>> If someone likes the idea, I'd like to share thoughts on that :-)
>>>
>>>
>>> Il giorno venerdì 26 settembre 2014 14:08:07 UTC+2, Jörg Prante ha
>>> scritto:
>>>>
>>>> Lukáš,
>>>>
>>>> of course you are right, RDF/XML looks complex and requires parsing.
>>>> The underlying principle of all RDF is a graph (or a series of triples in
>>>> form of subject/predicate/object, where the triple series is a
>>>> serialization of the graph), So the challenge is first the parsing of RDF
>>>> input, and second, constructing the model, and third, serializing the model
>>>> to an ES-friendly input (here: JSON-LD, sort of). RDF ensures that there is
>>>> a single model for all serializations.
>>>>
>>>> This technical perspective does not necessarily solve all challenges
>>>> that are inherent to the chosen data model. For example, nested resources
>>>> in RDF. It might be feasible to flatten nested resource by their
>>>> identifiers and generate one JSON after the other. Or it could be feasible
>>>> to keep nested resources intact and wrap them into nested structures in a
>>>> single ES JSON object.
>>>>
>>>> In my data model, I can map RDF subject IDs to ES doc IDs. Other data
>>>> models may prefer other approaches to select ES doc IDs.
>>>>
>>>> Jörg
>>>>
>>>>
>>>>
>>>> On Fri, Sep 26, 2014 at 10:11 AM, Lukáš Vlček <lukas...@gmail.com>
>>>> wrote:
>>>>
>>>>> Jörg,
>>>>>
>>>>> my concern is that RDF/XML allow to express one thing in several ways.
>>>>> For example, if you take FOAF specification then there are several ways 
>>>>> how
>>>>> you can express that one Person knows other Person. One way it using
>>>>> reference IDs other way it using nested Person inside other Person. See 
>>>>> [1]
>>>>> for examples. My understanding is that although both ways express exactly
>>>>> the same information they lead to different XML representation and thus to
>>>>> different JSON-LD. Not that you can push such data in ES but I wonder if
>>>>> you can then have any consistent way of querying such data.
>>>>>
>>>>> May be there is some way how you can preprocess XML document and
>>>>> convert all nested Persons to references (would require arbitrary ID
>>>>> construction?). Or something similar. Though I am not sure this would be
>>>>> generally applicable approach to any RDF data.
>>>>>
>>>>> [1] http://www.xml.com/pub/a/2004/02/04/foaf.html
>>>>>
>>>>> Regards,
>>>>> Lukas
>>>>>
>>>>> On Fri, Sep 26, 2014 at 9:28 AM, joerg...@gmail.com <
>>>>> joerg...@gmail.com> wrote:
>>>>>
>>>>>> JSON-LD is perfect for ES indexing, as long as you use the "compact"
>>>>>> form of representation.
>>>>>>
>>>>>> http://www.w3.org/TR/json-ld-api/#compaction-algorithms
>>>>>>
>>>>>> Example:
>>>>>>
>>>>>> https://github.com/lanthaler/JsonLD/blob/master/Test/Fixture
>>>>>> s/sample-compacted.jsonld
>>>>>>
>>>>>> This means you should use short field names and shorten IRIs to a
>>>>>> prefix form. This gives a convenient mapping to ES field names (e.g.
>>>>>> "dc:title" or "dc:creator"). The '@' fields can also be indexed and they 
>>>>>> do
>>>>>> not control anything special in ES (some @id may be mapped to ES _id but
>>>>>> for nested structures this does not match)
>>>>>>
>>>>>> I use my own RDF API and transform RDF graphs (so not only JSON-LD
>>>>>> but also other formats like N-Triples and RDF/XML) into XContent using 
>>>>>> this
>>>>>> method:
>>>>>>
>>>>>> https://github.com/xbib/xbib/blob/master/content/src/main/ja
>>>>>> va/org/xbib/rdf/content/DefaultResourceContentBuilder.java
>>>>>>
>>>>>> I plan to extend this content building by interpreting rdf:type and
>>>>>> rdf:list etc. to generate correct ES JSON objects and arrays. There is 
>>>>>> also
>>>>>> an amount of work left to do for the plethora of XSD types in RDF 
>>>>>> literals
>>>>>> or for language tags.
>>>>>>
>>>>>> This will be subsumed into an RDF input/output plugin for an ES-based
>>>>>> Linked Data Platform
>>>>>>
>>>>>> http://www.w3.org/TR/ldp/
>>>>>>
>>>>>> but there is no ETA yet.
>>>>>>
>>>>>> Jörg
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 26, 2014 at 5:08 AM, Lukáš Vlček <lukas...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I think you will have to preprocess documents on your side first and
>>>>>>> then push into ES individually (you can push in batch).
>>>>>>>
>>>>>>> As a side note, I would say json-ld is quite low level serialization
>>>>>>> od RDF data IMO not optimal for ES indexing. May be better would be to 
>>>>>>> find
>>>>>>> some RDF-OOM tool and have your RDF documents mapped to Java POJOs and
>>>>>>> serialize POJOs into JSONs instead (you can use Jackson library for that
>>>>>>> for example). This will give you better control over whole RDF -> JSON
>>>>>>> conversion process.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Lukas
>>>>>>>
>>>>>>> On Thu, Sep 25, 2014 at 7:21 PM, abo <a...@datavolution.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I'm new to Elasticsearch, so forgive me if this is a basic question
>>>>>>>> or if it's in some documentation that I haven't read...
>>>>>>>>
>>>>>>>> I am trying to load a json-ld file into ES. The json-ld file was
>>>>>>>> generated from an RDF file, using Jena. The structure starts with:
>>>>>>>>
>>>>>>>> {
>>>>>>>>   "@graph" :
>>>>>>>>
>>>>>>>> followed by the individual "documents", each with:
>>>>>>>>
>>>>>>>> {
>>>>>>>>     "@id" :
>>>>>>>>
>>>>>>>> and a variable number of parameters in each.
>>>>>>>>
>>>>>>>> My question is how do I load this into ES and ensure that documents
>>>>>>>> are individually referenced (as opposed to the entire json-ld file)?
>>>>>>>>
>>>>>>>> Do I need to doctor this json-ld file further in order to load it?
>>>>>>>>
>>>>>>>> Thanks for your help.
>>>>>>>>
>>>>>>>> -- abo
>>>>>>>>
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "elasticsearch" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb
>>>>>>>> 1-4c50-96c4-8f586e1e0807%40googlegroups.com
>>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/ec26bbe7-5bb1-4c50-96c4-8f586e1e0807%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "elasticsearch" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%
>>>>>>> 3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com
>>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAO9cvUYiqGoP5%3DpYkkhLzP17pLXAPN9sQVY9Oxn7AH4EY10xGA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>  --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>>> msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%
>>>>>> 3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com
>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHtOZmTcm1dYWKHxSfjNN%3D%3DqdoVwwvpg3DBEAcJz-xw5A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to elasticsearc...@googlegroups.com.
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45
>>>>> X7EGTEyc2bw%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAO9cvUZXZNtTAVw1Mhr7N%3D03wo7-L1rKqChja45X7EGTEyc2bw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/25674e99-8767-49be-9e7b-f3d9ae9dffde%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/ae48800a-b0df-47fe-aa05-b6fc7b272b00%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF7aoPkgdQF_id%3DA7KDBMffQaWtFBtnpvuBtjmJZpLqXQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Loading JSON-LD into ES

Reply via email to