Hi, thanks for your answer.
I mean Topic Annotation.
Ultimately what i would like to have is something like: { PDFuri
FoaF:PrimaryTopic London . } as triple in the return RDF.
But for now, i don’t concern myself with using FOAF.
I just want to have the main topics of the PDF. I don’t necessarily want to
extract all the entity etc….
SO maybe in term of the annotation generated i would say not having
fise:EntityAnnotation neither fise:TextAnnotation but simply
fise:TopicAnnotation
Many thanks
--
Maatari Daniel Okouya
Sent with Airmail
On 27 May 2014 at 13:08:38, Rupert Westenthaler ([email protected])
wrote:
On Tue, May 27, 2014 at 12:49 PM, Maatari Daniel Okouya
<[email protected]> wrote:
> Hi,
>
> I have just started to use apache stanbol. I’m still playing around with it
> to figure out everything that is out there. However, I’m puzzle by one thing.
> I would like to configure it such that upon uploading a text or a Pdf
> document, an RDF containing only the topic of the pdf shall be returned.
>
What do you mean by "topic"? In case of PDF files the Tika Engine [1]
can extract metadata. Such metadata are directly added to the URI of
the contentItem and do not use FISE.
> I’m scratching my head but i don’t see how to do so. What is the engine that
> is suppose to produce <<Fise:Annotation>>
>
All Stanbol Engines do generate FISE enhancements
(fise:TextAnnotation, fise:EntityAnnotation and fise:TopicAnnotation)
When you look at the list of engines [2]
* Language Detection engines create a fise:TextAnnotation describing
the language of the document (?la dc:type dc:LinguisticSystem; ?la
dc:language ?lang)
* Named Entity Recognition (NER) Engines create fise:TextAnnotations
for Entities recognized by the NLP framework.
* Linking / Suggestions create fise:EntityAnnotation for Entities
found in the text. They might also add fise:TextAnnotation to mark the
exact mention of such entities in the text.
* Topic Classification engines use fise:TopicAnnotation to describe
assigned topics. They also use a fise:TextAnnotation to mark the part
of the text the topic is assigned to
> as described in
> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html
>
Yep this page describes the annotations as created by the EnhancementEngines.
Without knowing what you mean by " ... only the topic of the pdf ..."
I can not recommend you suitable Stanbol configurations.
best
Rupert
>
>
[1] http://stanbol.apache.org/docs/trunk/components/enhancer/engines/tikaengine
[2] http://stanbol.apache.org/docs/trunk/components/enhancer/engines/list
> I would appreciate if someone could provide me with some pointers.
>
> Many thanks,
>
> Maatary
>
> --
> Maatari Daniel Okouya
> Sent with Airmail
--
| Rupert Westenthaler [email protected]
| Bodenlehenstraße 11 ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO
..........................................................................
| http://redlink.co/