Hi Luigi,
Now I understand what you are doing. Pretty cool feature IMO!
Your EnhancementEngine that adds those "<entity_1> <owl:sameAs>
<entity_2>" statements to the metadata of the ContentItem should also
create "fise:Enhancement" for it.
As there is no fitting Annotation defined in the enhancement ontology
I would suggest that you create your own. Something like an
EntitySimilarityAnnotation. It could define
* fise:entity-reference: This property could be used to link to all
Entities that are marked as similar
* similarity-type (or maybe dc:type) : The property that has stores
the type of the RDF property used to link the entities (owl:sameAs in
your case). If your create an OWL file make sure it is an Annotation
Property, because otherwise OWL-DL reasoners would become very unhappy
with your Ontology ^^
* dc:relation: This property should be used to link to all
fise:TextAnnotation and fise:EntityAnnotation used as input for
calculating the similarity (if applicable)
* fise:confidence: You should use this to store the confidence of the
linking (if available from SILK). Stanbol expects values to be in the
range [0..1]
If you do know what rules have caused the Entities to get linked by
SILK you could also add those as additional properties.
To make sure that all the fise:Enhancement specific properties are
initialized in a similar way as we do it in Stanbol I would create a
small utility class like
public static class EntitySimilarityAnnotationHelper extends
EnhancementEngineHelper{
static UriRef createEntitySimilarityAnnotation(ContentItem ci,
EnhancementEngine e){
MGraph graph = ci.getMetadata();
UriRef esa =
EnhancementEngineHelper.createEnhancement(graph, e, ci.getUri());
graph.add(new TripleImpl(esa, RDF_TYPE,
ENTIY_SIMILARIY_ANNOTATION));
return esa;
}
}
This has the advantage, that if we change for Stanbol how we write the
metadata for fise:Enhancement your code would automatically adapt as
well.
If you think such kind of Annotations could be of general use just
create an JIRA issue that describes the Annotation, explains the use
cases and also provides an example usage (take e.g. [1] as an
example). Based on that we could start the process of adding this
annotation to the Stanbol enhancement structure.
hope this answers your question
best
Rupert
[1]
http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure#fisetextannotation
On Wed, Jun 12, 2013 at 2:31 PM, Luigi Selmi <[email protected]> wrote:
> Hi Rupert,
>
> we are using XSLT transformation that we include within an enhancement
> engine. After that we want to look for links between entities extracted
> from the content-item by the XSLT transformation and a NER engine and
> entities stored in the entityhub. We want to do that by wrapping SILK in a
> bundle connected to the Stanbol SPARQL endpoint. When the bundle (an
> enhancement engine) receives the content-item it checks for duplicates
> following the comparisons defined in the SILK configuration file. The
> result can be some triples like <entity_1> <owl:sameAs> <entity_2> where
> <entity_1> comes from the content item and <entity_2> from the entityhub.
> These triples will be added to the content-item metadata but there is not a
> clear way to create an enhancement to connect these triples to the
> content-item. Maybe one way could be by reifying them. Do you have any
> advice/suggestion ?
>
> Best
>
> Luigi
>
>
> 2013/6/12 Rupert Westenthaler <[email protected]>
>
>> Hi Luigi,
>>
>> Regarding extracting the text from XML files you might want to check
>> if the TikaEngine can be used for that.
>>
>> Note also that you can parse pre-existing annotations to the Stanbol
>> Enhancer. You might want to have a look at this example [1].
>>
>> Sorry I have not understood the part with "<entity_1> <owl:sameAs>
>> <entity_2>"
>>
>> best
>> Rupert
>>
>>
>>
>> [1]
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancerrest#example-4-parse-existing-free-text-annotations
>>
>> On Tue, Jun 11, 2013 at 4:27 PM, Luigi Selmi <[email protected]> wrote:
>> > Hi Rupert,
>> >
>> > thanks for your answers. I am working with XML files and want to
>> transform
>> > them into RDF then use a NER engine to extract entities from those
>> > properties that have text like titles and abstracts (that we map to
>> related
>> > DC properties). I was also searching a way to use another engine to
>> > interlink entites extracted from the transformation and from the NER
>> engine
>> > above using a sparql endpoint that will add triples like <entity_1>
>> > <owl:sameAs> <entity_2> to the content item metadata. I was thinking to
>> > create enhancements every time a new resource is created by the
>> > transformation and a owl:sameAs relationship is found by an interlinking
>> > process. That will provide information about the document from which
>> those
>> > triples have been extracted and the engines that made it like what
>> happens
>> > when a plain text file is sent to a NLP engine.
>> >
>> > Best
>> >
>> > Luigi
>> >
>> >
>> > 2013/6/11 Rupert Westenthaler <[email protected]>
>> >
>> >> Hi Luigi,
>> >>
>> >> I am not sure if I understand your question, but let me try to answer.
>> >>
>> >> Please note [1] when reading through this mail.
>> >>
>> >> fise:EntityAnnotation use the fise:entity-reference property to link
>> >> to the URI of the suggested Entity. If your question was about what
>> >> happens if there are several fise:EntityAnnotation's referring to the
>> >> same Entity (same value for the fise:entity-reference) than the answer
>> >> is - it depends on the EnhancementEngine (and the situation)
>> >>
>> >> fise:EntityAnnotation may have multiple dc:relation properties
>> >> pointing to several fise:TextAnnotaions - in this case this means that
>> >> an Entity is suggested for several mentions within the analyzed
>> >> content.
>> >>
>> >> However EnhancementEngines may also decide to create multiple
>> >> fise:EnityAnnotation instances all pointing to the same entity. This
>> >> is typically the case for disambiguation Engines (e.g. the
>> >> disambiguation-mlt engine) as those will want to note different
>> >> fise:confidence values for the different mentions linked with the same
>> >> Entity.
>> >>
>> >> The Stanbol Enhancer does not add any relations between entities. If
>> >> you see relations "<entity1> <owl:sameAs> <entity2>" than it means
>> >> that (1) dereferencing of linked Entities is enabled and (2) those
>> >> triples where already present in the knowledge base where the Entity
>> >> do come from.
>> >>
>> >> Hope this answers your question
>> >> best
>> >> Rupert
>> >>
>> >> [1]
>> >>
>> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.png
>> >>
>> >>
>> >> On Mon, Jun 10, 2013 at 7:36 PM, Luigi Selmi <[email protected]>
>> wrote:
>> >> > Hi all,
>> >> >
>> >> > in the documentation is not clear which properties are attached to an
>> >> > enhancement created by a linking engine after two URIs have been
>> found to
>> >> > represent the same entity. Which are the FISE properties used to state
>> >> that
>> >> >
>> >> > <entity1> <owl:sameAs> <entity2>
>> >> >
>> >> > where <entity1> and <entity2> are referenced by <enhancement1> and
>> >> > <enhancement2> ?
>> >> >
>> >> > Following the way in which enhancements are created in Stanbol a
>> linking
>> >> > engine should create a new enhancement, say <enhancement3> with its
>> >> > confidence value that should state in some value the fact above but I
>> >> > couldn't find any clear statement about this in the documentation.
>> Anyone
>> >> > knows how a linking engine works for this ? Thanks in advance.
>> >> >
>> >> >
>> >> > Luigi
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler [email protected]
>> >> | Bodenlehenstraße 11 ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler [email protected]
>> | Bodenlehenstraße 11 ++43-699-11108907
>> | A-5500 Bischofshofen
>>
--
| Rupert Westenthaler [email protected]
| Bodenlehenstraße 11 ++43-699-11108907
| A-5500 Bischofshofen