Re: Replace or augment UIMA/OpenNLP pipeline with Stanbol

Rupert Westenthaler Thu, 27 Sep 2012 23:43:25 -0700

Hi Wayne,

On Thu, Sep 27, 2012 at 6:17 PM, Wayne Rasmuss
<wayne.rasm...@perceptivesoftware.com> wrote:
> I've been working with UIMA and OpenNLP together. Basically I've got the
> OpenNLP/UIMA example working. This gives me annotated text with tokens,
> sentences, parts of speech, chunks (verb phrase, noun phrase, etc.) It also
> attempts organizations, dates and locations though I don't get reliable
> results with them. Mostly I'm interested in parts of speech and chunks
> anyway.
>

Word level NLP annotations are currently not included in the
Enhancement Results. This is mainly because this would result in 20+
triples per word. However with STANBOL-733 "Stanbol NLP processing"
this feature will be added. Development of this is done in an own
branch [1]. This branch also includes an own Stanbol Launcher that
allows to easily test the current state of development (build and
start the launcher and than post some text to the
http://localhost:8080/enhancer/chain/nlp-processing)

I will give you a short overview. Details can be found in JIRA:

* AnalysedText: Java Domain Model that represents results of NLP. The
AnalysedText is added to the ContentItem as ContentPart (see
STANBOL-734 for code examples)
* NLP 2 RDF: This is an EnhancementEngine that converts the
information of the AnalysedText to RDF by using NIF (NLP Interchange
Format) - a set of OWL ontologies that allow to formally represent NLP
results (see STANBOL-741). NOTE that the NLP results provided by the
nlp-processing chain of the Stanbol Launcher do already use NIF
* The opennlp.pos EnhancementEngine supports POS tagging of parsed
texts in all languages supported by openNLP (STANBOL-735). As part of
that is also detects and adds Sentence annotations. The
opennlp.chunker EnhancementEngine consumes Tokens and POS tags and
performs chunking (STANBOL-736). Chunking is supported for English and
German. There is also a sentiment.wordclassifier EnhancementEngine
that adds sentiment tags on word level (based on SentiWordNet in
English and SentiWS for German).

You might also have a look at a presentation [2] about the Stanbol NLP
processing module I gave at the MOLDE workshop this week in Leipzig.

[1] http://svn.apache.org/repos/asf/stanbol/branches/stanbol-nlp-processing/
[2] http://stanbol.apache.org/presentations/Stanbol_NLP_processing_2012-09.pdf

> I've been looking around and Stanbol looks like it may be easier to deal
> with and give me more advanced capabilities. I've done the first part of
> the getting started guide, but not the "full" version. I got he web
> interface up and was able to get some enhanced text. So that was great.
>
> After that I'm kind of stumped. I would like to get the annotated text
> (like I'm getting from UIMA/OpenNLP) so we can do analysis on it. Can
> someone help get started with setting up/calling stanbol so I can get the
> details in the enhanced result?
>

If you want to stay with the RESTful service you will need to
implement against the NIF as generated by the "NLP2RDF" engine. If you
plan to access the StanbolEnhancer via its Java API I think that the
API of the AnalyzedText (STANBOL-734) should give you everything you
need.

You might also want to consider to implement your own analysis as
Stanbol EnhancementEngine. This blog [3] provides a good introduction
on how to do that.

[3] 
http://blog.iks-project.eu/getting-started-with-apache-stanbol-enhancement-engine/

>
> We're working with Groovy as our glue code. Bertrand provided me with this
> example.https://gist.github.com/2931050 which looks very promising, I think
> what I need to do is basically add OpenNLP enhancers here and figure out
> how to call it.
>

The "opennlp.pos" and "opennlp.cunker" Engines should exactly provide
the information you are looking for. AFAIK the Apache Camel example
provided by Bertrand should allow you to call the according
Engines/Chain and also support direct access to the results stored in
the AnalyzedText content part. But as I am not familiar with Camel it
would be good if Bertrand could confirm this.

Please NOTE that the Stanbol NLP processing is still in heavy
development. So things might still change. The current plan is to have
a first rather stable version of STANBOL-733  available in the trunk
by end of October.

best
Rupert

-- 
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Replace or augment UIMA/OpenNLP pipeline with Stanbol

Reply via email to