Hi Mark, Sorry for the late replay, but with ApacheCon last week I need to catch up a lot of things ...
If you are looking for managing your RDF datasets and perform SPARQL queries on it I suggest you to use Apache Marmotta or Apache Jena. Marmotta also implements Linked Data Platform (LDP) the defines how to manage RDF data. Managing RDF vocabularies is not within the focus of Apache Stanbol. The two reasons why the Entityhub supports it is because 1. Triple Stores are not fast enough for queries as required for Entity Extraction. 2. At the time the Entityhub was Implemented their was no LDP so managing user vocabularies seamed like a nice feature to have. Nowadays I would recommend to use LDP and use the Entityhub just as a secondary index. So in case you want to extract persons and roles from text document here is how you can do it: This assumes that you do manage your Vocabularies outside of Apache Stanbol (e.g. in Apache Marmotta). You can index your FOAF vocabulary with the person data and SKOS thesaurus with the roles in a single of multiple Site. You will want to configure ManagedSite [1] with a Solr Yard as backend. You can update single entities (as they change) are update the whole RDF graph. TripleStores provide services to export a single resource and/or a whole Graph. The ManagedSite also allows to update a single Entity and/or to delete all and after that re-import the whole RDF graph. So what you will need is a component that performs such updates when you need them. Additional notes: * entity extraction by defaults does use rdfs:label the default configuration for ManagedSite does match some properties to rdfs:label. Those defaults should be fine for your use case. * For the Persons you might also need the foaf:name field with a concatenation of foaf:firstName and foaf:lastName in your ontology. (e.g. <foaf:name>John Smith</foaf:name>). If you also want to extract persons based on the lastName you will need to add the according mapping to the configuration of the ManagedSite. With this in place you will get Persons and Roles extracted from parsed texts. best Rupert [1] http://stanbol.apache.org/docs/trunk/components/entityhub/managedsite.html On Thu, Nov 13, 2014 at 8:12 AM, <[email protected]> wrote: > Hi, I have read the documentation as you directed, and have one further > question just to clarify. I am asking this as I am less experienced with > this sort of problem. > > I believe the example FOAF graph I have given as a way of storing the data > is probably inefficient, as the way I am storing the role someone has in the > FOAF graph means I have to 'look up' the entity every time someone does a > search to match both the literal string in the FOAF graph, and the literal > string that someone tries to search for. > > Would it be better to do something more similar to this example, where I > define a URI for 'Managing Director' myself: > > Example graph, showing person with role: > > <foaf:Person rdf:ID="johnsmith"> > <foaf:firstName>John</foaf:firstName> > <foaf:lastName>Smith</foaf:lastName> > <ex:role > rdf:resource="http://www.linkeddatatools.com/role/managingdirector"/> > </foaf:Person> > > And then in my SKOS vocabulary I would define: > > ex:managingdirector rdf:type skos:Concept; > skos:prefLabel "Managing Director"@en; > skos:altLabel "MD"@en; > skos:altLabel "President"@en; > skos:altLabel "CEO"@en. > > I would then, rather than merge both graphs, upload the SKOS vocabulary to > my EntityHub site. When a user made a search, I would carry out entity > extraction on the search string (e.g. user searches for 'President') which > would then return the http://www.linkeddatatools.com/role/managingdirector > entity match. > > I would then use SPARQL to query the FOAF graph for those with > http://www.linkeddatatools.com/role/managingdirector as ex:role. > > Is this not more efficient? Further, would I need to use EntityHub for this, > or would it be better simply to query my SKOS vocabulary myself using > SPARQL? > > Thanks for your patience and input on this, as I say I am relatively new to > this sort of problem and really do value any advice. > > > Best wishes > > Mark > > > > Quoting [email protected]: > >> Hi Rafa/Reto, >> >> Thanks very much for your replies - so I will look into: >> >> - Merging both graphs. >> - Uploading to a Stanbol Entity site. >> - Using entityhub/site/find/ in the documentation to return the subjects >> that match an ex:role with that SKOS label. >> >> I'm not clear from the replies how I would use SPARQL, if you have any >> further guidance then please let me know if this is a better option. >> >> >> Again thanks, >> >> Mark >> >> Quoting Rafa Haro <[email protected]>: >> >>> Hi Mark, Reto, >>> >>> >>> En 12 de noviembre de 2014 en 11:45:43, Reto Gmür ([email protected]) >>> escrito: >>> >>> On Wed, Nov 12, 2014 at 10:16 AM, Rafa Haro <[email protected]> wrote: >>> >>>> Hi Mark, >>>> >>>> You can solve your problem in Stanbol if you link or merge together both >>>> graphs in a single one and you create a site with it. After indexing the >>>> merged graph, you can use the EntityHub API and specifically the find >>>> (/entityhub/site/find) service to search for your label and then move to >>>> all the nodes associated to that skos label using an LDPath expression. >>>> Please take a look to the EntityHub REST API documentation. >>>> >>> >>> Just for completeness: After meging the two graphs (or even without) you >>> can also use SPARQL. >>> >>> >>> Yep, that’s true :-). I probably forgot to mention that if you are >>> planning to enrich documents using both graphs, the LDPath approach is also >>> available. >>> >>> Cheers, >>> Rafa >>> >>> >>> Cheers, >>> Reto >>> >>> >>> >>>> >>>> Hope that helps. Cheers, >>>> Rafa >>>> >>>> >>>> En 11 de noviembre de 2014 en 20:34:01, [email protected] ( >>>> [email protected]) escrito: >>>> >>>> Hi, here is an example of what I'm trying to achieve. Does Fusepool, >>>> or another solution, achieve this goal? >>>> >>>> I have an RDF graph in a graph store: >>>> >>>> ============================== >>>> >>>> <foaf:Person rdf:ID="johnsmith"> >>>> <foaf:firstName>John</foaf:firstName> >>>> <foaf:lastName>Smith</foaf:lastName> >>>> <ex:role>Managing Director</ex:role> >>>> </foaf:Person> >>>> >>>> ============================== >>>> >>>> I have the following SKOS vocabulary: >>>> >>>> ============================== >>>> >>>> ex:role rdf:type skos:Concept; >>>> skos:prefLabel "Managing Director"@en; >>>> skos:altLabel "MD"@en; >>>> skos:altLabel "President"@en; >>>> skos:altLabel "CEO"@en. >>>> >>>> ============================== >>>> >>>> If I search for anyone with the role 'President', I want to return >>>> John Smith (rdf:ID="johnsmith") - because 'President' is an >>>> alternative label for 'Managing Director'. >>>> >>>> Is this possible using an already established best practice, or >>>> framework? >>>> >>>> Please let me know if any further examples are required. >>>> >>>> >>>> Best wishes >>>> >>>> Mark >>>> >>>> Quoting Reto Gmür <[email protected]>: >>>> >>>>> Hi Linked Data Tools >>>>> >>>>> One difficulty might arise because ContentHub has the index and the >>>> >>>> facets >>>>> >>>>> in lucene only and other metadata in an RDF graph. So for example if >>>>> contenthub provides a facet "Paris" you only have the label without any >>>>> association to the URI, so it won't be possible to get additional >>>>> properties of the resource. This is way in the fusepool project we've >>>>> chosen to build a store that stores all the data in an RDF graph and >>>> >>>> builds >>>>> >>>>> a lucene index on top of it. The code is here >>>>> https://github.com/fusepool/fusepool-ecs, its apache licensed and btw. >>>>> fusepool would be happy to donate it to the stanbol project. >>>>> >>>>> Cheers, >>>>> Reto >>>>> >>>>> On Mon, Nov 10, 2014 at 7:54 PM, <[email protected]> wrote: >>>>> >>>>>> Hi, I posted a similar message to the IKS mailing list, but understand >>>>>> from the response that this mailing list is no longer administrated. >>>>>> >>>>>> Stanbol is a great tool and I'm having some success with it; >>>> >>>> particularly >>>>>> >>>>>> the entity extractor tool. >>>>>> >>>>>> I have a requirement and, I am not sure the best way to approach this >>>> >>>> and >>>>>> >>>>>> whether a best practice for this sort of problem has already been >>>>>> established. >>>>>> >>>>>> I have an RDF graph - one in accordance with the FOAF ontology - and I >>>>>> have a controlled vocabulary in the form of a SKOS RDF graph, which >>>>>> contains a set of literal string terms and their semantic equivalents >>>> >>>> (e.g. >>>>>> >>>>>> 'President' <-> 'Managing Director' <-> 'Chief Executive' <-> 'MD' <-> >>>>>> etc.). >>>>>> >>>>>> I would like to search the literal strings in the FOAF graph for the >>>>>> occurrence of the string literals, and their equivalents as defined by >>>> >>>> the >>>>>> >>>>>> SKOS thesaurus. >>>>>> >>>>>> I can suggest one approach to this problem, but I fear it may be quite >>>>>> inefficient and take a long time, namely: >>>>>> >>>>>> - Query the RDF graph using SPARQL for all string literals. >>>>>> - Pass each string literal to the Stanbol Entity Extractor, having >>>>>> uploaded the SKOS thesaurus to the Stanbol Entity Hub. >>>>>> >>>>>> Now this seems quite a long winded. Further, I'm not even clear from >>>>>> the >>>>>> documentation whether the Stanbol Entity Extractor is capable of using >>>> >>>> SKOS >>>>>> >>>>>> vocabularies to map string literals to entities. Is Stanbol capable of >>>>>> extracting entities using a SKOS vocabulary? >>>>>> >>>>>> This seems a fairly common thing to do (semantic search of an RDF >>>>>> graph >>>>>> using a thesaurus) - is there some better way of solving this problem >>>> >>>> using >>>>>> >>>>>> an already established strategy? >>>>>> >>>>>> >>>>>> Many thanks! >>>>>> >>>>>> Linked Data Tools >>>>>> >>>>> >>>> >>>> >>> >> >> > -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen | REDLINK.CO .......................................................................... | http://redlink.co/
