Hi Rochbenritter,
The reason why Samsung is not found is because the Ontology defines
the labels as xsd:string
Here the excerpt of the ontology:
<gr:BusinessEntity rdf:ID="Samsung">
<gr:legalName
rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >Samsung
Group</gr:legalName>
<rdfs:seeAlso rdf:resource="http://www.samsung.com/"/>
<rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>Samsung</rdfs:label>
<rdfs:comment xml:lang="en">The business entity Samsung
Group.</rdfs:comment>
<belongsToModule xml:lang="en">MP3Player, TV, Printer,
DigitalCamera, Camcorder</belongsToModule>
</gr:BusinessEntity>
Expected would be something like
<rdfs:label xml:lang="en">Samsung</rdfs:label>
or simple
<rdfs:label>Samsung</rdfs:label>
Because those labels are defined as xsd:String they are indexed by the
indexing tool like
<arr name="str/gr:legalName/"><str>Samsung Group</str></arr>
<arr name="str/rdfs:label/"><str>Samsung</str></arr>
compared to natural language labels field that do start with a '@'.
Here the example for the rdfs:comment field
<arr name="@en/rdfs:comment/"><str>The business entity Samsung
Group.</str></arr>
This is also the reason why those Entities are missing in the
EntityLinking results. xsd:string values are currently not considered
by the Entity Linking Engines.
IMO EntityLinking should consider also xsd:String values. So I
consider this clearly as an Issue of the Stanbol Entity Linking
Engines. I will analyze the implementations of both the Entityhub
Linking Engine and the Lucene FST Linking engine and see how to solve
this issue.
As a workaround I see two possible solutions:
(a) remove all "rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
mentions from the ceo.owl file
(b) apply the following mappings to the "./indexing/config/mappings.txt" file
rdfs:label | d=entityhub:text
gr:legalName | d=entityhub:text
NOTE: there is already a line "rdfs:label". You should replace this
with "rdfs:label | d=entityhub:text"
For you understanding "{field} | d=entityhub:text" tells the indexing
tool to convert values of that field to the natural language text
datatype. Doing so will result in an SolrIndex that contains both the
xsd:String and the text version.
<arr name="str/rdfs:label/"><str>Samsung</str></arr>
<arr name="@/rdfs:label/"><str>Samsung</str></arr>
Thanks for your report. Before that I was completely unaware that
xsd:String values where not considered by the EntityLinking engine.
best
Rupert
On Tue, Sep 30, 2014 at 4:33 PM, Rochbenritter . <[email protected]> wrote:
> Hi All,
>
> We followed the instructions from
> https://stanbol.apache.org/docs/trunk/customvocabulary.html to create a new
> site for the CEO ontology (
> http://www.ebusiness-unibw.org/ontologies/consumerelectronics/v1).
>
> We managed to process the Ontology and to upload it as a site to the Apache
> Stanbol server. But it is not working totally correct. When an entry from
> the CEO-ontology refers to a rdf:type defined in the
> goodRelations-ontology, then the individual entry can?t be found. We assume
> additional mapping is needed. (e.g. ?LCD? and ?Ambilight 1? found but
> ?Philips? or ?Samsung? not).
>
> You can see our configuration in this Google-Drive folder
> https://drive.google.com/folderview?id=0B59O-GwTGmjjWVNPUGhYeGJmVXM&usp=drive_web
>
> How can we fix this problem?
>
> Cheers,
--
| Rupert Westenthaler [email protected]
| Bodenlehenstraße 11 ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO
..........................................................................
| http://redlink.co/